Skip to content

Instantly share code, notes, and snippets.

@xxxzhi
Last active September 16, 2017 09:06
Show Gist options
  • Save xxxzhi/8fc8f840a8ec07fdbae7c2fc2c77b3da to your computer and use it in GitHub Desktop.
Save xxxzhi/8fc8f840a8ec07fdbae7c2fc2c77b3da to your computer and use it in GitHub Desktop.
problem_test_code.py
Namespace(batch_size=10, ignore_label=0, input_size='228, 304', learning_rate=0.00025, model='model_joint4', momentum=0.9, num_classes=4, num_steps=15001, power=0.9, random_mirror=False, random_scale=False, random_seed=1234, restore_model=False, save_num_images=2, save_pred_every=1000, train_list='', weight_decay=0.0005)
begin test
Model size:3,600,248
Model size:7,209,840
2017-09-16 17:03:50.563408: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 17:03:50.563431: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 17:03:50.563439: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 17:03:50.563445: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 17:03:50.563451: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 17:03:51.268766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:08:00.0
Total memory: 11.90GiB
Free memory: 11.76GiB
2017-09-16 17:03:51.268825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-09-16 17:03:51.268834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-09-16 17:03:51.268849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:08:00.0)
2017-09-16 17:03:54.261152: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 822.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.264369: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 815.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.273886: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 885.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.273918: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.60GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.273934: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.288580: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.11GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.288611: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.57GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.305776: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.19GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.338472: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.49GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:03:54.338534: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 815.94MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-09-16 17:04:04.350564: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 177.19MiB. Current allocation summary follows.
2017-09-16 17:04:04.350611: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 3, Chunks in use: 0 768B allocated for chunks. 64B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350629: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 3, Chunks in use: 0 1.8KiB allocated for chunks. 516B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350647: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 4, Chunks in use: 0 5.5KiB allocated for chunks. 1.8KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350664: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 1, Chunks in use: 0 3.2KiB allocated for chunks. 2.9KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350680: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350695: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350710: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350726: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 1, Chunks in use: 0 46.2KiB allocated for chunks. 1.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350742: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 1, Chunks in use: 0 64.0KiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350757: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350773: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 1, Chunks in use: 0 260.0KiB allocated for chunks. 2.50MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350788: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350803: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350858: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350880: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 1, Chunks in use: 0 4.28MiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350897: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 1, Chunks in use: 0 14.00MiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350912: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350927: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350944: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 1, Chunks in use: 0 126.56MiB allocated for chunks. 126.56MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350958: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
2017-09-16 17:04:04.350973: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
...
017-09-16 17:04:24.406910: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102876fd200 of size 44359680
2017-09-16 17:04:24.406920: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028a14b200 of size 44359680
2017-09-16 17:04:24.406930: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028cb99200 of size 11089920
2017-09-16 17:04:24.406940: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1028d62ca00 of size 44359680
2017-09-16 17:04:24.406950: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1029007aa00 of size 22179840
2017-09-16 17:04:24.406960: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102915a1a00 of size 258048
2017-09-16 17:04:24.406970: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10291917900 of size 3870720
2017-09-16 17:04:24.406980: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10292ac8a00 of size 77629440
2017-09-16 17:04:24.406990: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102974d1200 of size 88719360
2017-09-16 17:04:24.407000: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1029c96d200 of size 88719360
2017-09-16 17:04:24.407010: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a1e09200 of size 11089920
2017-09-16 17:04:24.407020: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a289ca00 of size 11089920
2017-09-16 17:04:24.407031: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a3330200 of size 33269760
2017-09-16 17:04:24.407041: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102a52eaa00 of size 88719360
2017-09-16 17:04:24.407052: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102aa786a00 of size 46694400
2017-09-16 17:04:24.407062: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102ad40ea00 of size 44359680
2017-09-16 17:04:24.407072: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102afe5ca00 of size 44359680
2017-09-16 17:04:24.407082: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b28aaa00 of size 44359680
2017-09-16 17:04:24.407092: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b52f8a00 of size 44359680
2017-09-16 17:04:24.407102: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102b7d46a00 of size 88719360
2017-09-16 17:04:24.407111: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bd1e2a00 of size 11089920
2017-09-16 17:04:24.407121: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bdc76200 of size 22179840
2017-09-16 17:04:24.407131: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102bf19d200 of size 88719360
2017-09-16 17:04:24.407141: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102c4639200 of size 88719360
2017-09-16 17:04:24.407151: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102c9ad5200 of size 597196800
2017-09-16 17:04:24.407162: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x102ed45d200 of size 353894400
2017-09-16 17:04:24.407172: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103025dd200 of size 88719360
2017-09-16 17:04:24.407183: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10307a79200 of size 3870720
2017-09-16 17:04:24.407193: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103086d1000 of size 88719360
2017-09-16 17:04:24.407203: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1030db6d000 of size 22179840
2017-09-16 17:04:24.407213: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1030f094000 of size 44359680
2017-09-16 17:04:24.407222: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10311ae2000 of size 88719360
2017-09-16 17:04:24.407232: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10316f7e000 of size 44359680
2017-09-16 17:04:24.407242: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103199cc000 of size 88719360
2017-09-16 17:04:24.407252: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1031ee68000 of size 44359680
2017-09-16 17:04:24.407262: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x103218b6000 of size 88719360
2017-09-16 17:04:24.407273: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x10326d52000 of size 132710400
2017-09-16 17:04:24.407288: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020f879600 of size 65536
2017-09-16 17:04:24.407310: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaba00 of size 256
2017-09-16 17:04:24.407320: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feabc00 of size 1536
2017-09-16 17:04:24.407330: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feac300 of size 512
2017-09-16 17:04:24.407340: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feac600 of size 256
2017-09-16 17:04:24.407351: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaca00 of size 1024
2017-09-16 17:04:24.407361: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020fead000 of size 512
2017-09-16 17:04:24.407371: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020fead300 of size 1792
2017-09-16 17:04:24.407381: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feae000 of size 3328
2017-09-16 17:04:24.407391: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020feaf100 of size 1280
2017-09-16 17:04:24.407403: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff0a600 of size 768
2017-09-16 17:04:24.407413: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff0e500 of size 256
2017-09-16 17:04:24.407423: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1020ff12a00 of size 47360
2017-09-16 17:04:24.407434: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x102915e0a00 of size 3370752
2017-09-16 17:04:24.407445: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10291cc8900 of size 14680320
2017-09-16 17:04:24.407455: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x10307e2a200 of size 9072128
2017-09-16 17:04:24.407465: I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1032ebe2000 of size 258046208
2017-09-16 17:04:24.407475: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2017-09-16 17:04:24.407490: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 241 Chunks of size 256 totalling 60.2KiB
2017-09-16 17:04:24.407503: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 26 Chunks of size 512 totalling 13.0KiB
2017-09-16 17:04:24.407516: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 119 Chunks of size 1024 totalling 119.0KiB
2017-09-16 17:04:24.407528: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-09-16 17:04:24.407539: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1536 totalling 1.5KiB
2017-09-16 17:04:24.407552: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 52 Chunks of size 2048 totalling 104.0KiB
2017-09-16 17:04:24.407563: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 16384 totalling 64.0KiB
2017-09-16 17:04:24.407576: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 37632 totalling 73.5KiB
2017-09-16 17:04:24.407588: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 64512 totalling 63.0KiB
2017-09-16 17:04:24.407600: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 19 Chunks of size 65536 totalling 1.19MiB
2017-09-16 17:04:24.407612: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 66304 totalling 64.8KiB
2017-09-16 17:04:24.407624: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 81920 totalling 80.0KiB
2017-09-16 17:04:24.407636: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 11 Chunks of size 147456 totalling 1.55MiB
2017-09-16 17:04:24.407648: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 241920 totalling 236.2KiB
2017-09-16 17:04:24.407660: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 14 Chunks of size 258048 totalling 3.45MiB
2017-09-16 17:04:24.407672: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 258304 totalling 756.8KiB
2017-09-16 17:04:24.407684: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 10 Chunks of size 262144 totalling 2.50MiB
2017-09-16 17:04:24.407696: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 266240 totalling 260.0KiB
2017-09-16 17:04:24.407707: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 294912 totalling 1.12MiB
2017-09-16 17:04:24.407719: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 322816 totalling 315.2KiB
2017-09-16 17:04:24.407732: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 524288 totalling 10.50MiB
2017-09-16 17:04:24.407743: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 589824 totalling 2.25MiB
2017-09-16 17:04:24.407755: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1179648 totalling 3.38MiB
2017-09-16 17:04:24.407767: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1835008 totalling 1.75MiB
2017-09-16 17:04:24.407779: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 11 Chunks of size 2359296 totalling 24.75MiB
2017-09-16 17:04:24.407791: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 3870720 totalling 7.38MiB
2017-09-16 17:04:24.407802: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3924992 totalling 3.74MiB
2017-09-16 17:04:24.407814: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 8317440 totalling 7.93MiB
2017-09-16 17:04:24.407826: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 11089920 totalling 179.79MiB
2017-09-16 17:04:24.407839: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 7 Chunks of size 22179840 totalling 148.07MiB
2017-09-16 17:04:24.407851: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 33269760 totalling 31.73MiB
2017-09-16 17:04:24.407863: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 30 Chunks of size 44359680 totalling 1.24GiB
2017-09-16 17:04:24.407875: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 45137920 totalling 43.05MiB
2017-09-16 17:04:24.407887: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 46694400 totalling 44.53MiB
2017-09-16 17:04:24.407899: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50790400 totalling 48.44MiB
2017-09-16 17:04:24.407911: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 52741120 totalling 50.30MiB
2017-09-16 17:04:24.407923: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 77629440 totalling 148.07MiB
2017-09-16 17:04:24.407935: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 17 Chunks of size 88719360 totalling 1.40GiB
2017-09-16 17:04:24.407947: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 99809280 totalling 95.19MiB
2017-09-16 17:04:24.407959: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 132710400 totalling 126.56MiB
2017-09-16 17:04:24.407972: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 353894400 totalling 337.50MiB
2017-09-16 17:04:24.407984: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 597196800 totalling 569.53MiB
2017-09-16 17:04:24.407996: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 4.50GiB
2017-09-16 17:04:24.408010: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5112830361
InUse: 4827536384
MaxInUse: 5112738816
NumAllocs: 1291
MaxAllocSize: 4275296000
2017-09-16 17:04:24.408085: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *********************************************************************xxxx**********************_____
2017-09-16 17:04:24.408109: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[5760,512,5,6]
Traceback (most recent call last):
File "bug1.py", line 558, in <module>
train(args)
File "bug1.py", line 507, in train
sess.run(train_op, feed_dict=feed_dict)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1440,512,7,9]
[[Node: gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/fc1_voc12_c1/convolution_grad/Shape, fc1_voc12_c1/weights/read, gradients/fc1_voc12_c1/convolution/BatchToSpaceND_grad/SpaceToBatchND)]]
Caused by op u'gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput', defined at:
File "bug1.py", line 558, in <module>
train(args)
File "bug1.py", line 490, in train
grads = tf.gradients(reduced_loss_with_l2, tf.trainable_variables())
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 346, in _MaybeCompile
return grad_fn() # Exit early
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 540, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_grad.py", line 445, in _Conv2DGrad
op.get_attr("data_format")),
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 488, in conv2d_backprop_input
data_format=data_format, name=name)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
...which was originally created as op u'fc1_voc12_c1/convolution', defined at:
File "bug1.py", line 558, in <module>
train(args)
File "bug1.py", line 458, in train
is_training=is_training, num_classes=num_classes)
File "bug1.py", line 78, in __init__
self.setup(is_training, num_classes)
File "bug1.py", line 408, in setup
.atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1'))
File "bug1.py", line 53, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "bug1.py", line 203, in atrous_conv
output = convolve(input, kernel)
File "bug1.py", line 198, in <lambda>
convolve = lambda i, k: tf.nn.atrous_conv2d(i, k, dilation, padding=padding)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 972, in atrous_conv2d
name=name)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 670, in convolution
op=op)
File "/hik/home/houzhi/.conda/envs/houzhi/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 453, in with_space_to_batch
result = op(input_converted, num_spatial_dims, "VALID")
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1440,512,7,9]
[[Node: gradients/fc1_voc12_c1/convolution_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/fc1_voc12_c1/convolution_grad/Shape, fc1_voc12_c1/weights/read, gradients/fc1_voc12_c1/convolution/BatchToSpaceND_grad/SpaceToBatchND)]]
from __future__ import print_function
import numpy as np
import tensorflow as tf
# TODO change
IMG_MEAN = np.array((104.00698793, 116.66876762, 122.67891434), dtype=np.float32)
BATCH_SIZE = 10
IGNORE_LABEL = 0
INPUT_SIZE = '228, 304'
LEARNING_RATE = 2.5e-4
MOMENTUM = 0.9
NUM_CLASSES = 4
NUM_STEPS = 15001
POWER = 0.9
RANDOM_SEED = 1234
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 1000
MAX_TO_KEEP = 2
WEIGHT_DECAY = 0.0005
# from kaffe.tensorflow import Network
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim
DEFAULT_PADDING = 'SAME'
def layer(op):
'''Decorator for composable network layers.'''
def layer_decorated(self, *args, **kwargs):
# Automatically set a name if not provided.
name = kwargs.setdefault('name', self.get_unique_name(op.__name__))
# Figure out the layer inputs.
if len(self.terminals) == 0:
raise RuntimeError('No input variables found for layer %s.' % name)
elif len(self.terminals) == 1:
layer_input = self.terminals[0]
else:
layer_input = list(self.terminals)
# Perform the operation and get the output.
layer_output = op(self, layer_input, *args, **kwargs)
# Add to layer LUT.
self.layers[name] = layer_output
# This output is now the input for the next layer.
self.feed(layer_output)
# Return self for chained calls.
return self
return layer_decorated
class Network(object):
def __init__(self, inputs, trainable=True, is_training=False, num_classes=21):
# The input nodes for this network
self.inputs = inputs
# The current list of terminal nodes
self.terminals = []
# Mapping from layer names to layers
self.layers = dict(inputs)
# If true, the resulting variables are set as trainable
self.trainable = trainable
# Switch variable for dropout
self.use_dropout = tf.placeholder_with_default(tf.constant(1.0),
shape=[],
name='use_dropout')
self.setup(is_training, num_classes)
def setup(self, is_training):
'''Construct the network. '''
raise NotImplementedError('Must be implemented by the subclass.')
def load(self, data_path, session, ignore_missing=False):
'''Load network weights.
data_path: The path to the numpy-serialized network weights
session: The current TensorFlow session
ignore_missing: If true, serialized weights for missing layers are ignored.
'''
data_dict = np.load(data_path).item()
for op_name in data_dict:
with tf.variable_scope(op_name, reuse=True):
for param_name, data in data_dict[op_name].iteritems():
try:
var = tf.get_variable(param_name)
session.run(var.assign(data))
except ValueError:
if not ignore_missing:
raise
def feed(self, *args):
'''Set the input(s) for the next operation by replacing the terminal nodes.
The arguments can be either layer names or the actual layers.
'''
assert len(args) != 0
self.terminals = []
for fed_layer in args:
if isinstance(fed_layer, str):
try:
fed_layer = self.layers[fed_layer]
except KeyError:
raise KeyError('Unknown layer name fed: %s' % fed_layer)
self.terminals.append(fed_layer)
return self
def get_output(self):
'''Returns the current network output.'''
return self.terminals[-1]
def get_unique_name(self, prefix):
'''Returns an index-suffixed unique name for the given prefix.
This is used for auto-generating layer names based on the type-prefix.
'''
ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1
return '%s_%d' % (prefix, ident)
def make_var(self, name, shape):
'''Creates a new TensorFlow variable.'''
return tf.get_variable(name, shape, trainable=self.trainable)
def validate_padding(self, padding):
'''Verifies that the padding is one of the supported ones.'''
assert padding in ('SAME', 'VALID')
@layer
def conv(self,
input,
k_h,
k_w,
c_o,
s_h,
s_w,
name,
relu=True,
padding=DEFAULT_PADDING,
group=1,
biased=True):
# Verify that the padding is acceptable
self.validate_padding(padding)
# Get the number of channels in the input
c_i = input.get_shape()[-1]
# Verify that the grouping parameter is valid
assert c_i % group == 0
assert c_o % group == 0
# Convolution for a given input and kernel
convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding)
with tf.variable_scope(name) as scope:
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o])
if group == 1:
# This is the common-case. Convolve the input without any further complications.
output = convolve(input, kernel)
else:
# Split the input into groups and then convolve each of them independently
input_groups = tf.split(3, group, input)
kernel_groups = tf.split(3, group, kernel)
output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)]
# Concatenate the groups
output = tf.concat(3, output_groups)
# Add the biases
if biased:
biases = self.make_var('biases', [c_o])
output = tf.nn.bias_add(output, biases)
if relu:
# ReLU non-linearity
output = tf.nn.relu(output, name=scope.name)
return output
@layer
def atrous_conv(self,
input,
k_h,
k_w,
c_o,
dilation,
name,
relu=True,
padding=DEFAULT_PADDING,
group=1,
biased=True):
# Verify that the padding is acceptable
self.validate_padding(padding)
# Get the number of channels in the input
c_i = input.get_shape()[-1]
# Verify that the grouping parameter is valid
assert c_i % group == 0
assert c_o % group == 0
# Convolution for a given input and kernel
convolve = lambda i, k: tf.nn.atrous_conv2d(i, k, dilation, padding=padding)
with tf.variable_scope(name) as scope:
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o])
if group == 1:
# This is the common-case. Convolve the input without any further complications.
output = convolve(input, kernel)
else:
# Split the input into groups and then convolve each of them independently
input_groups = tf.split(3, group, input)
kernel_groups = tf.split(3, group, kernel)
output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)]
# Concatenate the groups
output = tf.concat(3, output_groups)
# Add the biases
if biased:
biases = self.make_var('biases', [c_o])
output = tf.nn.bias_add(output, biases)
if relu:
# ReLU non-linearity
output = tf.nn.relu(output, name=scope.name)
return output
@layer
def relu(self, input, name):
return tf.nn.relu(input, name=name)
@layer
def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
self.validate_padding(padding)
return tf.nn.max_pool(input,
ksize=[1, k_h, k_w, 1],
strides=[1, s_h, s_w, 1],
padding=padding,
name=name)
@layer
def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
self.validate_padding(padding)
return tf.nn.avg_pool(input,
ksize=[1, k_h, k_w, 1],
strides=[1, s_h, s_w, 1],
padding=padding,
name=name)
@layer
def lrn(self, input, radius, alpha, beta, name, bias=1.0):
return tf.nn.local_response_normalization(input,
depth_radius=radius,
alpha=alpha,
beta=beta,
bias=bias,
name=name)
@layer
def concat(self, inputs, axis, name):
return tf.concat(concat_dim=axis, values=inputs, name=name)
@layer
def add(self, inputs, name):
return tf.add_n(inputs, name=name)
@layer
def fc(self, input, num_out, name, relu=True):
with tf.variable_scope(name) as scope:
input_shape = input.get_shape()
if input_shape.ndims == 4:
# The input is spatial. Vectorize it first.
dim = 1
for d in input_shape[1:].as_list():
dim *= d
feed_in = tf.reshape(input, [-1, dim])
else:
feed_in, dim = (input, input_shape[-1].value)
weights = self.make_var('weights', shape=[dim, num_out])
biases = self.make_var('biases', [num_out])
op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b
fc = op(feed_in, weights, biases, name=scope.name)
return fc
@layer
def softmax(self, input, name):
input_shape = map(lambda v: v.value, input.get_shape())
if len(input_shape) > 2:
# For certain models (like NiN), the singleton spatial dimensions
# need to be explicitly squeezed, since they're not broadcast-able
# in TensorFlow's NHWC ordering (unlike Caffe's NCHW).
if input_shape[1] == 1 and input_shape[2] == 1:
input = tf.squeeze(input, squeeze_dims=[1, 2])
else:
raise ValueError('Rank 2 tensor input expected for softmax!')
return tf.nn.softmax(input, name)
@layer
def batch_normalization(self, input, name, is_training, activation_fn=None, scale=True):
with tf.variable_scope(name) as scope:
output = slim.batch_norm(
input,
activation_fn=activation_fn,
is_training=is_training,
updates_collections=None,
scale=scale,
scope=scope)
return output
@layer
def dropout(self, input, keep_prob, name):
keep = 1 - self.use_dropout + (self.use_dropout * keep_prob)
return tf.nn.dropout(input, keep, name=name)
class Model(Network):
def setup(self, is_training, num_classes):
'''Network definition.
Args:
is_training: whether to update the running mean and variance of the batch normalisation layer.
If the batch size is small, it is better to keep the running mean and variance of
the-pretrained model frozen.
num_classes: number of classes to predict (including background).
'''
last_both_name = 'bn4a_branch2c'
(self.feed('data')
.conv(7, 7, 64, 2, 2, biased=False, relu=False, name='conv1')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn_conv1')
.max_pool(3, 3, 2, 2, name='pool1')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch1'))
(self.feed('pool1')
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2a')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2b')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch2c'))
(self.feed('bn2a_branch1',
'bn2a_branch2c')
.add(name='res2a')
.relu(name='res2a_relu')
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2a')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2b')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2b_branch2c'))
(self.feed('res2a_relu',
'bn2b_branch2c')
.add(name='res2b')
.relu(name='res2b_relu')
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2a')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2b')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn2c_branch2c'))
(self.feed('res2a_relu')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4a_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2a')
.atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4a_branch2b')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res4a_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn4a_branch2c'))
(self.feed('bn4a_branch2c')
.relu(name='res4a_relu')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2a')
.atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b1_branch2b')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res4b1_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn4b1_branch2c'))
# segmentation
(self.feed('res4a_relu',
'bn4b1_branch2c')
.add(name='res5b')
.relu(name='res5b_relu')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res5c_branch2a')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5c_branch2a')
.atrous_conv(3, 3, 256, 4, padding='SAME', biased=False, relu=False, name='res5c_branch2b')
.batch_normalization(activation_fn=tf.nn.relu, name='bn5c_branch2b', is_training=is_training)
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2c')
.batch_normalization(is_training=is_training, activation_fn=None, name='bn5c_branch2c'))
(self.feed(last_both_name)
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='bc_' + 'con1')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn1')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='bc_' + 'con2')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn2')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='bc_' + 'con3')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn3')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='bc_' + 'con4')
.batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bc_' + 'bn4')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='bc_' + 'con5')
.batch_normalization(is_training=is_training, activation_fn=None, name='bc_' + 'bn5')
)
(self.feed('res5b_relu',
'bn5c_branch2c', 'bc_' + 'bn5')
.add(name='res5c')
.relu(name='res5c_relu')
.atrous_conv(3, 3, num_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 18, padding='SAME', relu=False, name='fc1_voc12_c2'))
(self.feed('res5c_relu')
.atrous_conv(3, 3, num_classes, 24, padding='SAME', relu=False, name='fc1_voc12_c3'))
(self.feed('fc1_voc12_c0',
'fc1_voc12_c1',
'fc1_voc12_c2',
'fc1_voc12_c3')
.add(name='fc1_voc12'))
def prepare_label(input_batch, new_size, num_classes, one_hot=True):
"""Resize masks and perform one-hot encoding.
Args:
input_batch: input tensor of shape [batch_size H W 1].
new_size: a tensor with new height and width.
num_classes: number of classes to predict (including background).
one_hot: whether perform one-hot encoding.
Returns:
Outputs a tensor of shape [batch_size h w 21]
with last dimension comprised of 0's and 1's only.
"""
with tf.name_scope('label_encode'):
input_batch = tf.image.resize_nearest_neighbor(input_batch,
new_size) # as labels are integer numbers, need to use NN interp.
input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) # reducing the channel dimension.
if one_hot:
input_batch = tf.one_hot(input_batch, depth=num_classes)
return input_batch
def train(args):
"""Create the model and start the training."""
h, w = map(int, args.input_size.split(','))
tf.set_random_seed(args.random_seed)
num_classes = 14
batch_size = 10
image_ph = tf.placeholder(tf.float32, (10, h, w, 3))
seg_ph = tf.placeholder(tf.uint8, (10, h, w, 1))
print('begin test')
# Create network.
is_training = tf.placeholder(tf.bool)
net = Model({'data': image_ph},
is_training=is_training, num_classes=num_classes)
print('Model size:{:,d}'.format(np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])))
# Predictions.
raw_output = net.layers['fc1_voc12']
####################################################################################################################
all_trainable = [v for v in tf.trainable_variables() if 'beta' not in v.name and 'gamma' not in v.name]
####################################################################################################################
# Calculate loss
# Segmentation loss.
raw_prediction = tf.reshape(raw_output, [-1, num_classes])
label_proc = prepare_label(seg_ph, tf.stack(raw_output.get_shape()[1:3]), num_classes=num_classes,
one_hot=False) # [batch_size, h, w]
raw_gt = tf.reshape(label_proc, [-1, ])
indices = tf.squeeze(tf.where(tf.less_equal(raw_gt, num_classes - 1)), 1)
gt = tf.cast(tf.gather(raw_gt, indices), tf.int32)
prediction = tf.gather(raw_prediction, indices)
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=prediction, labels=gt)
l2_losses_seg = [args.weight_decay * tf.nn.l2_loss(v) for v in all_trainable if 'weights' in v.name]
reduced_loss = tf.reduce_mean(loss)
reduced_loss_with_l2 = reduced_loss + tf.add_n(l2_losses_seg)
# Define loss and optimisation parameters.
base_lr = tf.constant(args.learning_rate)
step_ph = tf.placeholder(dtype=tf.float32, shape=())
learning_rate = tf.scalar_mul(base_lr, tf.pow((1 - step_ph / args.num_steps), args.power))
opt = tf.train.MomentumOptimizer(learning_rate, args.momentum)
grads = tf.gradients(reduced_loss_with_l2, tf.trainable_variables())
train_op = opt.apply_gradients(zip(grads, tf.trainable_variables()))
print('Model size:{:,d}'.format(np.sum([np.prod(v.get_shape().as_list()) for v in tf.global_variables()])))
########################################################################################################################
# When is_training is False, I have to set fraction with 0.5 to avoid OOM
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.4)
config = tf.ConfigProto(gpu_options=gpu_options)
with tf.Session(config=config) as sess:
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
for step in range(11):
image = np.random.random_integers(0, 255, 10 * h * w * 3).reshape((10, h, w, 3)).astype(np.float32)
seg = np.random.random_integers(0, 14, 10 * h * w * 1).reshape((10, h, w, 1))
# When feed is_training with False, The program will raise OOM
feed_dict = {step_ph: step, is_training: False, image_ph: image, seg_ph: seg}
sess.run(train_op, feed_dict=feed_dict)
def get_arguments():
"""Parse all the arguments provided from the CLI.
Returns:
A list of parsed arguments.
"""
import argparse
parser = argparse.ArgumentParser(description="DeepLab-ResNet Network")
parser.add_argument("--batch-size", type=int, default=BATCH_SIZE,
help="Number of images sent to the network in one step.")
parser.add_argument("--ignore-label", type=int, default=IGNORE_LABEL,
help="The index of the label to ignore during the training.")
parser.add_argument("--input-size", type=str, default=INPUT_SIZE,
help="Comma-separated string with height and width of images.")
parser.add_argument("--learning-rate", type=float, default=LEARNING_RATE,
help="Base learning rate for training with polynomial decay.")
parser.add_argument("--momentum", type=float, default=MOMENTUM,
help="Momentum component of the optimiser.")
parser.add_argument("--num-classes", type=int, default=NUM_CLASSES,
help="Number of classes to predict (including background).")
parser.add_argument("--num-steps", type=int, default=NUM_STEPS,
help="Number of training steps.")
parser.add_argument("--power", type=float, default=POWER,
help="Decay parameter to compute the learning rate.")
parser.add_argument("--random-mirror", action="store_true",
help="Whether to randomly mirror the inputs during the training.")
parser.add_argument("--random-scale", action="store_true",
help="Whether to randomly scale the inputs during the training.")
parser.add_argument("--random-seed", type=int, default=RANDOM_SEED,
help="Random seed to have reproducible results.")
parser.add_argument("--restore-model", action="store_true",
help="Whether to restore model from restore-from")
parser.add_argument("--save-num-images", type=int, default=SAVE_NUM_IMAGES,
help="How many images to save.")
parser.add_argument("--save-pred-every", type=int, default=SAVE_PRED_EVERY,
help="Save summaries and checkpoint every often.")
parser.add_argument("--weight-decay", type=float, default=WEIGHT_DECAY,
help="Regularisation parameter for L2-loss.")
parser.add_argument("--model", type=str, default='model_joint4',
help="Model path")
parser.add_argument("--train-list", type=str, default='',
help="train file list contains image file list")
return parser.parse_args()
if __name__ == '__main__':
args = get_arguments()
print(args)
train(args)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment