Last active
September 16, 2017 09:06
-
-
Save xxxzhi/8fc8f840a8ec07fdbae7c2fc2c77b3da to your computer and use it in GitHub Desktop.
problem_test_code.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Namespace(batch_size=10, ignore_label=0, input_size='228, 304', learning_rate=0.00025, model='model_joint4', momentum=0.9, num_classes=4, num_steps=15001, power=0.9, random_mirror=False, random_scale=False, random_seed=1234, restore_model=False, save_num_images=2, save_pred_every=1000, train_list='', weight_decay=0.0005) | |
begin test | |
Model size:3,600,248 | |
Model size:7,209,840 | |
2017-09-16 17:03:50.563408: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. | |
2017-09-16 17:03:50.563431: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. | |
2017-09-16 17:03:50.563439: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. | |
2017-09-16 17:03:50.563445: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. | |
2017-09-16 17:03:50.563451: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. | |
2017-09-16 17:03:51.268766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: | |
name: TITAN X (Pascal) | |
major: 6 minor: 1 memoryClockRate (GHz) 1.531 | |
pciBusID 0000:08:00.0 | |
Total memory: 11.90GiB | |
Free memory: 11.76GiB | |
2017-09-16 17:03:51.268825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 | |
2017-09-16 17:03:51.268834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y | |
2017-09-16 17:03:51.268849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:08:00.0) | |
2017-09-16 17:03:54.261152: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 822.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.264369: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 815.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.273886: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 885.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.273918: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.60GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.273934: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.288580: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.11GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.288611: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.57GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.305776: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.338472: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.49GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:03:54.338534: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 815.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. | |
2017-09-16 17:04:04.350564: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 177.19MiB. Current allocation summary follows. | |
2017-09-16 17:04:04.350611: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 3, Chunks in use: 0 768B allocated for chunks. 64B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350629: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 3, Chunks in use: 0 1.8KiB allocated for chunks. 516B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350647: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 4, Chunks in use: 0 5.5KiB allocated for chunks. 1.8KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350664: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 1, Chunks in use: 0 3.2KiB allocated for chunks. 2.9KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350680: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350695: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350710: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350726: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 1, Chunks in use: 0 46.2KiB allocated for chunks. 1.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350742: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 1, Chunks in use: 0 64.0KiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350757: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350773: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 1, Chunks in use: 0 260.0KiB allocated for chunks. 2.50MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350788: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350803: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350858: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350880: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 1, Chunks in use: 0 4.28MiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350897: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 1, Chunks in use: 0 14.00MiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350912: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350927: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350944: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 1, Chunks in use: 0 126.56MiB allocated for chunks. 126.56MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350958: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
2017-09-16 17:04:04.350973: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. | |
... | |
017-09-16 17:04:24.406910: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102876fd200 of size 44359680 | |
2017-09-16 17:04:24.406920: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028a14b200 of size 44359680 | |
2017-09-16 17:04:24.406930: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028cb99200 of size 11089920 | |
2017-09-16 17:04:24.406940: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028d62ca00 of size 44359680 | |
2017-09-16 17:04:24.406950: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1029007aa00 of size 22179840 | |
2017-09-16 17:04:24.406960: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102915a1a00 of size 258048 | |
2017-09-16 17:04:24.406970: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10291917900 of size 3870720 | |
2017-09-16 17:04:24.406980: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10292ac8a00 of size 77629440 | |
2017-09-16 17:04:24.406990: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102974d1200 of size 88719360 | |
2017-09-16 17:04:24.407000: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1029c96d200 of size 88719360 | |
2017-09-16 17:04:24.407010: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a1e09200 of size 11089920 | |
2017-09-16 17:04:24.407020: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a289ca00 of size 11089920 | |
2017-09-16 17:04:24.407031: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a3330200 of size 33269760 | |
2017-09-16 17:04:24.407041: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a52eaa00 of size 88719360 | |
2017-09-16 17:04:24.407052: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102aa786a00 of size 46694400 | |
2017-09-16 17:04:24.407062: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102ad40ea00 of size 44359680 | |
2017-09-16 17:04:24.407072: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102afe5ca00 of size 44359680 | |
2017-09-16 17:04:24.407082: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b28aaa00 of size 44359680 | |
2017-09-16 17:04:24.407092: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b52f8a00 of size 44359680 | |
2017-09-16 17:04:24.407102: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b7d46a00 of size 88719360 | |
2017-09-16 17:04:24.407111: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bd1e2a00 of size 11089920 | |
2017-09-16 17:04:24.407121: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bdc76200 of size 22179840 | |
2017-09-16 17:04:24.407131: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bf19d200 of size 88719360 | |
2017-09-16 17:04:24.407141: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102c4639200 of size 88719360 | |
2017-09-16 17:04:24.407151: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102c9ad5200 of size 597196800 | |
2017-09-16 17:04:24.407162: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102ed45d200 of size 353894400 | |
2017-09-16 17:04:24.407172: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103025dd200 of size 88719360 | |
2017-09-16 17:04:24.407183: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10307a79200 of size 3870720 | |
2017-09-16 17:04:24.407193: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103086d1000 of size 88719360 | |
2017-09-16 17:04:24.407203: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1030db6d000 of size 22179840 | |
2017-09-16 17:04:24.407213: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1030f094000 of size 44359680 | |
2017-09-16 17:04:24.407222: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10311ae2000 of size 88719360 | |
2017-09-16 17:04:24.407232: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10316f7e000 of size 44359680 | |
2017-09-16 17:04:24.407242: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103199cc000 of size 88719360 | |
2017-09-16 17:04:24.407252: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1031ee68000 of size 44359680 | |
2017-09-16 17:04:24.407262: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103218b6000 of size 88719360 | |
2017-09-16 17:04:24.407273: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10326d52000 of size 132710400 | |
2017-09-16 17:04:24.407288: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020f879600 of size 65536 | |
2017-09-16 17:04:24.407310: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaba00 of size 256 | |
2017-09-16 17:04:24.407320: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feabc00 of size 1536 | |
2017-09-16 17:04:24.407330: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feac300 of size 512 | |
2017-09-16 17:04:24.407340: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feac600 of size 256 | |
2017-09-16 17:04:24.407351: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaca00 of size 1024 | |
2017-09-16 17:04:24.407361: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020fead000 of size 512 | |
2017-09-16 17:04:24.407371: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020fead300 of size 1792 | |
2017-09-16 17:04:24.407381: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feae000 of size 3328 | |
2017-09-16 17:04:24.407391: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaf100 of size 1280 | |
2017-09-16 17:04:24.407403: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff0a600 of size 768 | |
2017-09-16 17:04:24.407413: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff0e500 of size 256 | |
2017-09-16 17:04:24.407423: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff12a00 of size 47360 | |
2017-09-16 17:04:24.407434: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102915e0a00 of size 3370752 | |
2017-09-16 17:04:24.407445: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10291cc8900 of size 14680320 | |
2017-09-16 17:04:24.407455: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10307e2a200 of size 9072128 | |
2017-09-16 17:04:24.407465: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1032ebe2000 of size 258046208 | |
2017-09-16 17:04:24.407475: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size: | |
2017-09-16 17:04:24.407490: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 241 Chunks of size 256 totalling 60.2KiB | |
2017-09-16 17:04:24.407503: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 26 Chunks of size 512 totalling 13.0KiB | |
2017-09-16 17:04:24.407516: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 119 Chunks of size 1024 totalling 119.0KiB | |
2017-09-16 17:04:24.407528: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB | |
2017-09-16 17:04:24.407539: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1536 totalling 1.5KiB | |
2017-09-16 17:04:24.407552: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 52 Chunks of size 2048 totalling 104.0KiB | |
2017-09-16 17:04:24.407563: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 16384 totalling 64.0KiB | |
2017-09-16 17:04:24.407576: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 37632 totalling 73.5KiB | |
2017-09-16 17:04:24.407588: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 64512 totalling 63.0KiB | |
2017-09-16 17:04:24.407600: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 19 Chunks of size 65536 totalling 1.19MiB | |
2017-09-16 17:04:24.407612: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 66304 totalling 64.8KiB | |
2017-09-16 17:04:24.407624: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 81920 totalling 80.0KiB | |
2017-09-16 17:04:24.407636: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 11 Chunks of size 147456 totalling 1.55MiB | |
2017-09-16 17:04:24.407648: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 241920 totalling 236.2KiB | |
2017-09-16 17:04:24.407660: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 14 Chunks of size 258048 totalling 3.45MiB | |
2017-09-16 17:04:24.407672: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 258304 totalling 756.8KiB | |
2017-09-16 17:04:24.407684: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 262144 totalling 2.50MiB | |
2017-09-16 17:04:24.407696: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 266240 totalling 260.0KiB | |
2017-09-16 17:04:24.407707: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 294912 totalling 1.12MiB | |
2017-09-16 17:04:24.407719: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 322816 totalling 315.2KiB | |
2017-09-16 17:04:24.407732: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 524288 totalling 10.50MiB | |
2017-09-16 17:04:24.407743: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 589824 totalling 2.25MiB | |
2017-09-16 17:04:24.407755: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1179648 totalling 3.38MiB | |
2017-09-16 17:04:24.407767: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1835008 totalling 1.75MiB | |
2017-09-16 17:04:24.407779: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 11 Chunks of size 2359296 totalling 24.75MiB | |
2017-09-16 17:04:24.407791: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 3870720 totalling 7.38MiB | |
2017-09-16 17:04:24.407802: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3924992 totalling 3.74MiB | |
2017-09-16 17:04:24.407814: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 8317440 totalling 7.93MiB | |
2017-09-16 17:04:24.407826: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 11089920 totalling 179.79MiB | |
2017-09-16 17:04:24.407839: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 7 Chunks of size 22179840 totalling 148.07MiB | |
2017-09-16 17:04:24.407851: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 33269760 totalling 31.73MiB | |
2017-09-16 17:04:24.407863: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 30 Chunks of size 44359680 totalling 1.24GiB | |
2017-09-16 17:04:24.407875: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 45137920 totalling 43.05MiB | |
2017-09-16 17:04:24.407887: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 46694400 totalling 44.53MiB | |
2017-09-16 17:04:24.407899: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50790400 totalling 48.44MiB | |
2017-09-16 17:04:24.407911: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 52741120 totalling 50.30MiB | |
2017-09-16 17:04:24.407923: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 77629440 totalling 148.07MiB | |
2017-09-16 17:04:24.407935: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 88719360 totalling 1.40GiB | |
2017-09-16 17:04:24.407947: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 99809280 totalling 95.19MiB | |
2017-09-16 17:04:24.407959: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 132710400 totalling 126.56MiB | |
2017-09-16 17:04:24.407972: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 353894400 totalling 337.50MiB | |
2017-09-16 17:04:24.407984: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 597196800 totalling 569.53MiB | |
2017-09-16 17:04:24.407996: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 4.50GiB | |
2017-09-16 17:04:24.408010: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: | |
Limit: 5112830361 | |
InUse: 4827536384 | |
MaxInUse: 5112738816 | |
NumAllocs: 1291 | |
MaxAllocSize: 4275296000 | |
2017-09-16 17:04:24.408085: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *********************************************************************xxxx**********************_____ | |
2017-09-16 17:04:24.408109: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[5760,512,5,6] | |
Traceback (most recent call last): | |
File "bug1.py", line 558, in <module> | |
train(args) | |
File "bug1.py", line 507, in train | |
sess.run(train_op, feed_dict=feed_dict) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run | |
run_metadata_ptr) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run | |
feed_dict_string, options, run_metadata) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run | |
target_list, options, run_metadata) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call | |
raise type(e)(node_def, op, message) | |
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1440,512,7,9] | |
[[Node: gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/fc1_voc12_c1/convolution_grad/Shape, fc1_voc12_c1/weights/read, gradients/fc1_voc12_c1/convolution/BatchToSpaceND_grad/SpaceToBatchND)]] | |
Caused by op u'gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput', defined at: | |
File "bug1.py", line 558, in <module> | |
train(args) | |
File "bug1.py", line 490, in train | |
grads = tf.gradients(reduced_loss_with_l2, tf.trainable_variables()) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients | |
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile | |
return grad_fn() # Exit early | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda> | |
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_grad.py", line 445, in _Conv2DGrad | |
op.get_attr("data_format")), | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 488, in conv2d_backprop_input | |
data_format=data_format, name=name) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op | |
op_def=op_def) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op | |
original_op=self._default_original_op, op_def=op_def) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__ | |
self._traceback = _extract_stack() | |
...which was originally created as op u'fc1_voc12_c1/convolution', defined at: | |
File "bug1.py", line 558, in <module> | |
train(args) | |
File "bug1.py", line 458, in train | |
is_training=is_training, num_classes=num_classes) | |
File "bug1.py", line 78, in __init__ | |
self.setup(is_training, num_classes) | |
File "bug1.py", line 408, in setup | |
.atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1')) | |
File "bug1.py", line 53, in layer_decorated | |
layer_output = op(self, layer_input, *args, **kwargs) | |
File "bug1.py", line 203, in atrous_conv | |
output = convolve(input, kernel) | |
File "bug1.py", line 198, in <lambda> | |
convolve = lambda i, k: tf.nn.atrous_conv2d(i, k, dilation, padding=padding) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 972, in atrous_conv2d | |
name=name) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 670, in convolution | |
op=op) | |
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 453, in with_space_to_batch | |
result = op(input_converted, num_spatial_dims, "VALID") | |
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1440,512,7,9] | |
[[Node: gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/fc1_voc12_c1/convolution_grad/Shape, fc1_voc12_c1/weights/read, gradients/fc1_voc12_c1/convolution/BatchToSpaceND_grad/SpaceToBatchND)]] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import print_function | |
import numpy as np | |
import tensorflow as tf | |
# TODO change | |
IMG_MEAN = np.array((104.00698793, 116.66876762, 122.67891434), dtype=np.float32) | |
BATCH_SIZE = 10 | |
IGNORE_LABEL = 0 | |
INPUT_SIZE = '228, 304' | |
LEARNING_RATE = 2.5e-4 | |
MOMENTUM = 0.9 | |
NUM_CLASSES = 4 | |
NUM_STEPS = 15001 | |
POWER = 0.9 | |
RANDOM_SEED = 1234 | |
SAVE_NUM_IMAGES = 2 | |
SAVE_PRED_EVERY = 1000 | |
MAX_TO_KEEP = 2 | |
WEIGHT_DECAY = 0.0005 | |
# from kaffe.tensorflow import Network | |
import numpy as np | |
import tensorflow as tf | |
slim = tf.contrib.slim | |
DEFAULT_PADDING = 'SAME' | |
def layer(op): | |
'''Decorator for composable network layers.''' | |
def layer_decorated(self, *args, **kwargs): | |
# Automatically set a name if not provided. | |
name = kwargs.setdefault('name', self.get_unique_name(op.__name__)) | |
# Figure out the layer inputs. | |
if len(self.terminals) == 0: | |
raise RuntimeError('No input variables found for layer %s.' % name) | |
elif len(self.terminals) == 1: | |
layer_input = self.terminals[0] | |
else: | |
layer_input = list(self.terminals) | |
# Perform the operation and get the output. | |
layer_output = op(self, layer_input, *args, **kwargs) | |
# Add to layer LUT. | |
self.layers[name] = layer_output | |
# This output is now the input for the next layer. | |
self.feed(layer_output) | |
# Return self for chained calls. | |
return self | |
return layer_decorated | |
class Network(object): | |
def __init__(self, inputs, trainable=True, is_training=False, num_classes=21): | |
# The input nodes for this network | |
self.inputs = inputs | |
# The current list of terminal nodes | |
self.terminals = [] | |
# Mapping from layer names to layers | |
self.layers = dict(inputs) | |
# If true, the resulting variables are set as trainable | |
self.trainable = trainable | |
# Switch variable for dropout | |
self.use_dropout = tf.placeholder_with_default(tf.constant(1.0), | |
shape=[], | |
name='use_dropout') | |
self.setup(is_training, num_classes) | |
def setup(self, is_training): | |
'''Construct the network. ''' | |
raise NotImplementedError('Must be implemented by the subclass.') | |
def load(self, data_path, session, ignore_missing=False): | |
'''Load network weights. | |
data_path: The path to the numpy-serialized network weights | |
session: The current TensorFlow session | |
ignore_missing: If true, serialized weights for missing layers are ignored. | |
''' | |
data_dict = np.load(data_path).item() | |
for op_name in data_dict: | |
with tf.variable_scope(op_name, reuse=True): | |
for param_name, data in data_dict[op_name].iteritems(): | |
try: | |
var = tf.get_variable(param_name) | |
session.run(var.assign(data)) | |
except ValueError: | |
if not ignore_missing: | |
raise | |
def feed(self, *args): | |
'''Set the input(s) for the next operation by replacing the terminal nodes. | |
The arguments can be either layer names or the actual layers. | |
''' | |
assert len(args) != 0 | |
self.terminals = [] | |
for fed_layer in args: | |
if isinstance(fed_layer, str): | |
try: | |
fed_layer = self.layers[fed_layer] | |
except KeyError: | |
raise KeyError('Unknown layer name fed: %s' % fed_layer) | |
self.terminals.append(fed_layer) | |
return self | |
def get_output(self): | |
'''Returns the current network output.''' | |
return self.terminals[-1] | |
def get_unique_name(self, prefix): | |
'''Returns an index-suffixed unique name for the given prefix. | |
This is used for auto-generating layer names based on the type-prefix. | |
''' | |
ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1 | |
return '%s_%d' % (prefix, ident) | |
def make_var(self, name, shape): | |
'''Creates a new TensorFlow variable.''' | |
return tf.get_variable(name, shape, trainable=self.trainable) | |
def validate_padding(self, padding): | |
'''Verifies that the padding is one of the supported ones.''' | |
assert padding in ('SAME', 'VALID') | |
@layer | |
def conv(self, | |
input, | |
k_h, | |
k_w, | |
c_o, | |
s_h, | |
s_w, | |
name, | |
relu=True, | |
padding=DEFAULT_PADDING, | |
group=1, | |
biased=True): | |
# Verify that the padding is acceptable | |
self.validate_padding(padding) | |
# Get the number of channels in the input | |
c_i = input.get_shape()[-1] | |
# Verify that the grouping parameter is valid | |
assert c_i % group == 0 | |
assert c_o % group == 0 | |
# Convolution for a given input and kernel | |
convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) | |
with tf.variable_scope(name) as scope: | |
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o]) | |
if group == 1: | |
# This is the common-case. Convolve the input without any further complications. | |
output = convolve(input, kernel) | |
else: | |
# Split the input into groups and then convolve each of them independently | |
input_groups = tf.split(3, group, input) | |
kernel_groups = tf.split(3, group, kernel) | |
output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)] | |
# Concatenate the groups | |
output = tf.concat(3, output_groups) | |
# Add the biases | |
if biased: | |
biases = self.make_var('biases', [c_o]) | |
output = tf.nn.bias_add(output, biases) | |
if relu: | |
# ReLU non-linearity | |
output = tf.nn.relu(output, name=scope.name) | |
return output | |
@layer | |
def atrous_conv(self, | |
input, | |
k_h, | |
k_w, | |
c_o, | |
dilation, | |
name, | |
relu=True, | |
padding=DEFAULT_PADDING, | |
group=1, | |
biased=True): | |
# Verify that the padding is acceptable | |
self.validate_padding(padding) | |
# Get the number of channels in the input | |
c_i = input.get_shape()[-1] | |
# Verify that the grouping parameter is valid | |
assert c_i % group == 0 | |
assert c_o % group == 0 | |
# Convolution for a given input and kernel | |
convolve = lambda i, k: tf.nn.atrous_conv2d(i, k, dilation, padding=padding) | |
with tf.variable_scope(name) as scope: | |
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o]) | |
if group == 1: | |
# This is the common-case. Convolve the input without any further complications. | |
output = convolve(input, kernel) | |
else: | |
# Split the input into groups and then convolve each of them independently | |
input_groups = tf.split(3, group, input) | |
kernel_groups = tf.split(3, group, kernel) | |
output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)] | |
# Concatenate the groups | |
output = tf.concat(3, output_groups) | |
# Add the biases | |
if biased: | |
biases = self.make_var('biases', [c_o]) | |
output = tf.nn.bias_add(output, biases) | |
if relu: | |
# ReLU non-linearity | |
output = tf.nn.relu(output, name=scope.name) | |
return output | |
@layer | |
def relu(self, input, name): | |
return tf.nn.relu(input, name=name) | |
@layer | |
def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): | |
self.validate_padding(padding) | |
return tf.nn.max_pool(input, | |
ksize=[1, k_h, k_w, 1], | |
strides=[1, s_h, s_w, 1], | |
padding=padding, | |
name=name) | |
@layer | |
def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): | |
self.validate_padding(padding) | |
return tf.nn.avg_pool(input, | |
ksize=[1, k_h, k_w, 1], | |
strides=[1, s_h, s_w, 1], | |
padding=padding, | |
name=name) | |
@layer | |
def lrn(self, input, radius, alpha, beta, name, bias=1.0): | |
return tf.nn.local_response_normalization(input, | |
depth_radius=radius, | |
alpha=alpha, | |
beta=beta, | |
bias=bias, | |
name=name) | |
@layer | |
def concat(self, inputs, axis, name): | |
return tf.concat(concat_dim=axis, values=inputs, name=name) | |
@layer | |
def add(self, inputs, name): | |
return tf.add_n(inputs, name=name) | |
@layer | |
def fc(self, input, num_out, name, relu=True): | |
with tf.variable_scope(name) as scope: | |
input_shape = input.get_shape() | |
if input_shape.ndims == 4: | |
# The input is spatial. Vectorize it first. | |
dim = 1 | |
for d in input_shape[1:].as_list(): | |
dim *= d | |
feed_in = tf.reshape(input, [-1, dim]) | |
else: | |
feed_in, dim = (input, input_shape[-1].value) | |
weights = self.make_var('weights', shape=[dim, num_out]) | |
biases = self.make_var('biases', [num_out]) | |
op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b | |
fc = op(feed_in, weights, biases, name=scope.name) | |
return fc | |
@layer | |
def softmax(self, input, name): | |
input_shape = map(lambda v: v.value, input.get_shape()) | |
if len(input_shape) > 2: | |
# For certain models (like NiN), the singleton spatial dimensions | |
# need to be explicitly squeezed, since they're not broadcast-able | |
# in TensorFlow's NHWC ordering (unlike Caffe's NCHW). | |
if input_shape[1] == 1 and input_shape[2] == 1: | |
input = tf.squeeze(input, squeeze_dims=[1, 2]) | |
else: | |
raise ValueError('Rank 2 tensor input expected for softmax!') | |
return tf.nn.softmax(input, name) | |
@layer | |
def batch_normalization(self, input, name, is_training, activation_fn=None, scale=True): | |
with tf.variable_scope(name) as scope: | |
output = slim.batch_norm( | |
input, | |
activation_fn=activation_fn, | |
is_training=is_training, | |
updates_collections=None, | |
scale=scale, | |
scope=scope) | |
return output | |
@layer | |
def dropout(self, input, keep_prob, name): | |
keep = 1 - self.use_dropout + (self.use_dropout * keep_prob) | |
return tf.nn.dropout(input, keep, name=name) | |
class Model(Network): | |
def setup(self, is_training, num_classes): | |
'''Network definition. | |
Args: | |
is_training: whether to update the running mean and variance of the batch normalisation layer. | |
If the batch size is small, it is better to keep the running mean and variance of | |
the-pretrained model frozen. | |
num_classes: number of classes to predict (including background). | |
''' | |
last_both_name = 'bn4a_branch2c' | |
(self.feed('data') | |
.conv(7, 7, 64, 2, 2, biased=False, relu=False, name='conv1') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn_conv1') | |
.max_pool(3, 3, 2, 2, name='pool1') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch1')) | |
(self.feed('pool1') | |
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2a') | |
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2b') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch2c')) | |
(self.feed('bn2a_branch1', | |
'bn2a_branch2c') | |
.add(name='res2a') | |
.relu(name='res2a_relu') | |
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2a') | |
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2b') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2b_branch2c')) | |
(self.feed('res2a_relu', | |
'bn2b_branch2c') | |
.add(name='res2b') | |
.relu(name='res2b_relu') | |
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2a') | |
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2b') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2c_branch2c')) | |
(self.feed('res2a_relu') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4a_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2a') | |
.atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4a_branch2b') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2b') | |
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res4a_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn4a_branch2c')) | |
(self.feed('bn4a_branch2c') | |
.relu(name='res4a_relu') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2a') | |
.atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b1_branch2b') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2b') | |
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res4b1_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn4b1_branch2c')) | |
# segmentation | |
(self.feed('res4a_relu', | |
'bn4b1_branch2c') | |
.add(name='res5b') | |
.relu(name='res5b_relu') | |
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res5c_branch2a') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5c_branch2a') | |
.atrous_conv(3, 3, 256, 4, padding='SAME', biased=False, relu=False, name='res5c_branch2b') | |
.batch_normalization(activation_fn=tf.nn.relu, name='bn5c_branch2b', is_training=is_training) | |
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2c') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bn5c_branch2c')) | |
(self.feed(last_both_name) | |
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='bc_' + 'con1') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn1') | |
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='bc_' + 'con2') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn2') | |
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='bc_' + 'con3') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn3') | |
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='bc_' + 'con4') | |
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn4') | |
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='bc_' + 'con5') | |
.batch_normalization(is_training=is_training, activation_fn=None, name='bc_' + 'bn5') | |
) | |
(self.feed('res5b_relu', | |
'bn5c_branch2c', 'bc_' + 'bn5') | |
.add(name='res5c') | |
.relu(name='res5c_relu') | |
.atrous_conv(3, 3, num_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0')) | |
(self.feed('res5c_relu') | |
.atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1')) | |
(self.feed('res5c_relu') | |
.atrous_conv(3, 3, num_classes, 18, padding='SAME', relu=False, name='fc1_voc12_c2')) | |
(self.feed('res5c_relu') | |
.atrous_conv(3, 3, num_classes, 24, padding='SAME', relu=False, name='fc1_voc12_c3')) | |
(self.feed('fc1_voc12_c0', | |
'fc1_voc12_c1', | |
'fc1_voc12_c2', | |
'fc1_voc12_c3') | |
.add(name='fc1_voc12')) | |
def prepare_label(input_batch, new_size, num_classes, one_hot=True): | |
"""Resize masks and perform one-hot encoding. | |
Args: | |
input_batch: input tensor of shape [batch_size H W 1]. | |
new_size: a tensor with new height and width. | |
num_classes: number of classes to predict (including background). | |
one_hot: whether perform one-hot encoding. | |
Returns: | |
Outputs a tensor of shape [batch_size h w 21] | |
with last dimension comprised of 0's and 1's only. | |
""" | |
with tf.name_scope('label_encode'): | |
input_batch = tf.image.resize_nearest_neighbor(input_batch, | |
new_size) # as labels are integer numbers, need to use NN interp. | |
input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) # reducing the channel dimension. | |
if one_hot: | |
input_batch = tf.one_hot(input_batch, depth=num_classes) | |
return input_batch | |
def train(args): | |
"""Create the model and start the training.""" | |
h, w = map(int, args.input_size.split(',')) | |
tf.set_random_seed(args.random_seed) | |
num_classes = 14 | |
batch_size = 10 | |
image_ph = tf.placeholder(tf.float32, (10, h, w, 3)) | |
seg_ph = tf.placeholder(tf.uint8, (10, h, w, 1)) | |
print('begin test') | |
# Create network. | |
is_training = tf.placeholder(tf.bool) | |
net = Model({'data': image_ph}, | |
is_training=is_training, num_classes=num_classes) | |
print('Model size:{:,d}'.format(np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()]))) | |
# Predictions. | |
raw_output = net.layers['fc1_voc12'] | |
#################################################################################################################### | |
all_trainable = [v for v in tf.trainable_variables() if 'beta' not in v.name and 'gamma' not in v.name] | |
#################################################################################################################### | |
# Calculate loss | |
# Segmentation loss. | |
raw_prediction = tf.reshape(raw_output, [-1, num_classes]) | |
label_proc = prepare_label(seg_ph, tf.stack(raw_output.get_shape()[1:3]), num_classes=num_classes, | |
one_hot=False) # [batch_size, h, w] | |
raw_gt = tf.reshape(label_proc, [-1, ]) | |
indices = tf.squeeze(tf.where(tf.less_equal(raw_gt, num_classes - 1)), 1) | |
gt = tf.cast(tf.gather(raw_gt, indices), tf.int32) | |
prediction = tf.gather(raw_prediction, indices) | |
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=prediction, labels=gt) | |
l2_losses_seg = [args.weight_decay * tf.nn.l2_loss(v) for v in all_trainable if 'weights' in v.name] | |
reduced_loss = tf.reduce_mean(loss) | |
reduced_loss_with_l2 = reduced_loss + tf.add_n(l2_losses_seg) | |
# Define loss and optimisation parameters. | |
base_lr = tf.constant(args.learning_rate) | |
step_ph = tf.placeholder(dtype=tf.float32, shape=()) | |
learning_rate = tf.scalar_mul(base_lr, tf.pow((1 - step_ph / args.num_steps), args.power)) | |
opt = tf.train.MomentumOptimizer(learning_rate, args.momentum) | |
grads = tf.gradients(reduced_loss_with_l2, tf.trainable_variables()) | |
train_op = opt.apply_gradients(zip(grads, tf.trainable_variables())) | |
print('Model size:{:,d}'.format(np.sum([np.prod(v.get_shape().as_list()) for v in tf.global_variables()]))) | |
######################################################################################################################## | |
# When is_training is False, I have to set fraction with 0.5 to avoid OOM | |
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.4) | |
config = tf.ConfigProto(gpu_options=gpu_options) | |
with tf.Session(config=config) as sess: | |
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) | |
sess.run(init_op) | |
for step in range(11): | |
image = np.random.random_integers(0, 255, 10 * h * w * 3).reshape((10, h, w, 3)).astype(np.float32) | |
seg = np.random.random_integers(0, 14, 10 * h * w * 1).reshape((10, h, w, 1)) | |
# When feed is_training with False, The program will raise OOM | |
feed_dict = {step_ph: step, is_training: False, image_ph: image, seg_ph: seg} | |
sess.run(train_op, feed_dict=feed_dict) | |
def get_arguments(): | |
"""Parse all the arguments provided from the CLI. | |
Returns: | |
A list of parsed arguments. | |
""" | |
import argparse | |
parser = argparse.ArgumentParser(description="DeepLab-ResNet Network") | |
parser.add_argument("--batch-size", type=int, default=BATCH_SIZE, | |
help="Number of images sent to the network in one step.") | |
parser.add_argument("--ignore-label", type=int, default=IGNORE_LABEL, | |
help="The index of the label to ignore during the training.") | |
parser.add_argument("--input-size", type=str, default=INPUT_SIZE, | |
help="Comma-separated string with height and width of images.") | |
parser.add_argument("--learning-rate", type=float, default=LEARNING_RATE, | |
help="Base learning rate for training with polynomial decay.") | |
parser.add_argument("--momentum", type=float, default=MOMENTUM, | |
help="Momentum component of the optimiser.") | |
parser.add_argument("--num-classes", type=int, default=NUM_CLASSES, | |
help="Number of classes to predict (including background).") | |
parser.add_argument("--num-steps", type=int, default=NUM_STEPS, | |
help="Number of training steps.") | |
parser.add_argument("--power", type=float, default=POWER, | |
help="Decay parameter to compute the learning rate.") | |
parser.add_argument("--random-mirror", action="store_true", | |
help="Whether to randomly mirror the inputs during the training.") | |
parser.add_argument("--random-scale", action="store_true", | |
help="Whether to randomly scale the inputs during the training.") | |
parser.add_argument("--random-seed", type=int, default=RANDOM_SEED, | |
help="Random seed to have reproducible results.") | |
parser.add_argument("--restore-model", action="store_true", | |
help="Whether to restore model from restore-from") | |
parser.add_argument("--save-num-images", type=int, default=SAVE_NUM_IMAGES, | |
help="How many images to save.") | |
parser.add_argument("--save-pred-every", type=int, default=SAVE_PRED_EVERY, | |
help="Save summaries and checkpoint every often.") | |
parser.add_argument("--weight-decay", type=float, default=WEIGHT_DECAY, | |
help="Regularisation parameter for L2-loss.") | |
parser.add_argument("--model", type=str, default='model_joint4', | |
help="Model path") | |
parser.add_argument("--train-list", type=str, default='', | |
help="train file list contains image file list") | |
return parser.parse_args() | |
if __name__ == '__main__': | |
args = get_arguments() | |
print(args) | |
train(args) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment