Last active
December 19, 2015 13:58
Revisions
-
Floppy renamed this gist
Jul 10, 2013 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
Floppy created this gist
Jul 10, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,155 @@ I'm trying to write a CSV-compatible diff for git, using word-diff, so that I can see things like added columns easily. In .gitattribues I've added: ``` *.csv diff=csv ``` And in .git/config I've added: ``` [diff "csv"] wordRegex = "." ``` This first (very stupid) version works fine, giving me output like this: ``` diff --git a/data.csv b/data.csv index 3470d93..c795b80 100644 --- a/data.csv +++ b/data.csv @@ -1,19 +1,20 @@ planetary_body,a{+phelion,a+}cceleration "Earth",{+152098232,+}9.80665 "Moon",{+,+}1.625 "Sun",{+0,+}274.1 "{+Mercury",69816900,3.7+} {+"+}Venus",{+108939000,+}8.872 "Mars",{+249209300,+}3.78 "Jupiter",{+816520800,+}25.93 "Io",{+,+}1.789 "Europa",{+,+}1.314 "Ganymede",{+,+}1.426 "Callisto",{+,+}1.24 "Saturn",1{+513325783,1+}1.19 "Titan",{+,+}1.3455 "Uranus",{+3004419704,+}9.01 "Titania",{+,+}0.379 "Oberon",{+,+}0.347 "Neptune",{+4553946490,+}11.28 "Triton",[- -]{+,+}0.779 "Pluto",[- -]{+7311000000,+}0.61 ``` These changes are correct, but obviously not very useful. It's treating every character as an individual word, but that means it gets some things a bit wrong, such as the diff in the first line. Ideally the diff should split into actual CSV fields, which means splitting on commas. (I know that's simplistic, but this is the first stage of a better solution). My understanding is that the `wordDiff` regexp defines what IS a word. So, as far as I can tell, if I tell it a word is any sequence that doesn't include a comma: ``` [diff "csv"] wordRegex = "[^,]*" ``` it should split my fields correctly. But no, this one gives me nothing in my diff. No changes shown. ``` diff --git a/data.csv b/data.csv index 3470d93..c795b80 100644 --- a/data.csv +++ b/data.csv @@ -1,19 +1,20 @@ planetary_body,aphelion,acceleration "Earth",152098232,9.80665 "Moon",,1.625 "Sun",0,274.1 "Mercury",69816900,3.7 "Venus",108939000,8.872 "Mars",249209300,3.78 "Jupiter",816520800,25.93 "Io",,1.789 "Europa",,1.314 "Ganymede",,1.426 "Callisto",,1.24 "Saturn",1513325783,11.19 "Titan",,1.3455 "Uranus",3004419704,9.01 "Titania",,0.379 "Oberon",,0.347 "Neptune",4553946490,11.28 "Triton",,0.779 "Pluto",7311000000,0.61 ``` If I try saying there should be at least one character that's not a comma, I get something: ``` [diff "csv"] wordRegex = "[^,]+" ``` ``` diff --git a/data.csv b/data.csv index 3470d93..c795b80 100644 --- a/data.csv +++ b/data.csv @@ -1,19 +1,20 @@ planetary_body,{+aphelion+},acceleration "Earth",152098232,9.80665 "Moon",,1.625 "Sun",0,274.1 "Mercury",69816900,3.7 "Venus",108939000,8.872 "Mars",249209300,3.78 "Jupiter",816520800,25.93 "Io",,1.789 "Europa",,1.314 "Ganymede",,1.426 "Callisto",,1.24 "Saturn",1513325783,11.19 "Titan",,1.3455 "Uranus",3004419704,9.01 "Titania",,0.379 "Oberon",,0.347 "Neptune",4553946490,11.28 "Triton",,0.779 "Pluto",7311000000,0.61 ``` However, it seems to have given up after the first line. If I say there also shouldn't be spaces in words, I get something more sensible that handles multiple lines, but this rule isn't right; spaces should be allowed in fields: ``` [diff "csv"] wordRegex = "[^,[:space:]]+" ``` ``` diff --git a/data.csv b/data.csv index 3470d93..c795b80 100644 --- a/data.csv +++ b/data.csv @@ -1,19 +1,20 @@ planetary_body,{+aphelion+},acceleration "Earth",{+152098232+},9.80665 "Moon",,1.625 "Sun",{+0+},274.1 {+"Mercury",69816900,3.7+} "Venus",{+108939000+},8.872 "Mars",{+249209300+},3.78 "Jupiter",{+816520800+},25.93 "Io",,1.789 "Europa",,1.314 "Ganymede",,1.426 "Callisto",,1.24 "Saturn",{+1513325783+},11.19 "Titan",,1.3455 "Uranus",{+3004419704+},9.01 "Titania",,0.379 "Oberon",,0.347 "Neptune",{+4553946490+},11.28 "Triton",,0.779 "Pluto",{+7311000000+},0.61 ``` I'm getting very confused and going round in circles a bit. If I start to try to do things like detect commas at the end of "words", newlines, etc, it all gets a bit unpredictable. I'm testing these regexes with git grep as well and getting them working there, but they seem to behave differently when I try to put it into the word diff. I get the feeling I'm doing something wrong, but I'm not sure what. Does anyone know?