Created
January 16, 2025 14:05
-
-
Save tuler/dee8aa192836b86a7d8a073f243508ea to your computer and use it in GitHub Desktop.
community notes successful run
This file has been truncated, but you can view the full file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO:birdwatch.runner:scorer python version: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] | |
INFO:birdwatch.runner:scorer pandas version: 2.2.2 | |
INFO:birdwatch.runner:beginning scorer execution | |
INFO:birdwatch.process_data:Timestamp of latest rating in data: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.process_data:Timestamp of latest note in data: 2025-01-12 01:02:59.773000 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Output mismatch on createdAtMillis: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on createdAtMillis_notes: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_status_history:total notes added to noteStatusHistory: 62133 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 57: newNoteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/process_data.py, in _filter_misleading_notes, at line 270: ratings = ratings.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.process_data:Preprocess Data: Filter misleading notes, starting with 121640095 ratings on 1595616 notes | |
INFO:birdwatch.process_data: Keeping 87726864 ratings on 1071361 misleading notes | |
INFO:birdwatch.process_data: Keeping 8970460 ratings on 152922 deleted notes that were previously scored (in note status history) | |
INFO:birdwatch.process_data: Removing 58590 ratings on 2907 older notes that aren't deleted, but are not-misleading. | |
INFO:birdwatch.process_data: Removing 9559 ratings on 1133 notes that were deleted and not in note status history (e.g. old). | |
INFO:birdwatch.process_data:Num Ratings: 121571946, Num Unique Notes Rated: 1591576, Num Unique Raters: 1057435 | |
INFO:birdwatch.process_data:Called filter_input_data_for_testing. | |
Notes: 1583780, Ratings: 121571946. Max note createdAt: 2025-01-12 01:02:59.773000; Max rating createAt: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.process_data:After filtering notes and ratings after particular timestamp (=None). | |
Notes: 1583780, Ratings: 121571946. Max note createdAt: 2025-01-12 01:02:59.773000; Max rating createAt: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.process_data:After filtering ratings after first status (plus None hours) for notes created in last 14 days. | |
Notes: 1583780, Ratings: 121571946. Max note createdAt: 2025-01-12 01:02:59.773000; Max rating createAt: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.process_data:After filtering prescoring notes and ratings to simulate a delay of None hours: | |
Notes: 1583780, Ratings: 121571946. Max note createdAt: 2025-01-12 01:02:59.773000; Max rating createAt: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.constants:Compute pair counts dict elapsed time: 13884.14 secs (231.40 mins) | |
INFO:birdwatch.constants:Compute PMI and minSim elapsed time: 2636.11 secs (43.94 mins) | |
INFO:birdwatch.constants:Delete unneeded pairs from pairCountsDict elapsed time: 328.19 secs (5.47 mins) | |
INFO:birdwatch.constants:Aggregate into cliques by post selection similarity elapsed time: 20.45 secs (0.34 mins) | |
INFO:birdwatch.constants:Compute Post Selection Similarity elapsed time: 17095.69 secs (284.93 mins) | |
INFO:birdwatch.run_scoring:logging environment variables | |
INFO:birdwatch.run_scoring:notes total RAM: 125118992 bytes (0.125 GB) | |
column dtype RAM | |
0 noteId int64 12670240 | |
1 noteAuthorParticipantId object 12670240 | |
2 createdAtMillis int64 12670240 | |
3 tweetId object 12670240 | |
4 classification object 12670240 | |
5 believable category 1583904 | |
6 harmful category 1583904 | |
7 validationDifficulty category 1583904 | |
8 misleadingOther Int8 3167560 | |
9 misleadingFactualError Int8 3167560 | |
10 misleadingManipulatedMedia Int8 3167560 | |
11 misleadingOutdatedInformation Int8 3167560 | |
12 misleadingMissingImportantContext Int8 3167560 | |
13 misleadingUnverifiedClaimAsFact Int8 3167560 | |
14 misleadingSatire Int8 3167560 | |
15 notMisleadingOther Int8 3167560 | |
16 notMisleadingFactuallyCorrect Int8 3167560 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3167560 | |
18 notMisleadingClearlySatire Int8 3167560 | |
19 notMisleadingPersonalOpinion Int8 3167560 | |
20 trustworthySources Int8 3167560 | |
21 summary object 12670240 | |
22 isMediaNote Int8 3167560 | |
INFO:birdwatch.run_scoring:ratings total RAM: 11549335002 bytes (11.549 GB) | |
column dtype RAM | |
0 noteId int64 972575568 | |
1 raterParticipantId object 972575568 | |
2 createdAtMillis int64 972575568 | |
3 version Int8 243143892 | |
4 agree Int8 243143892 | |
5 disagree Int8 243143892 | |
6 helpful Int8 243143892 | |
7 notHelpful Int8 243143892 | |
8 helpfulnessLevel category 121572078 | |
9 helpfulOther Int8 243143892 | |
10 helpfulInformative Int8 243143892 | |
11 helpfulClear Int8 243143892 | |
12 helpfulEmpathetic Int8 243143892 | |
13 helpfulGoodSources Int8 243143892 | |
14 helpfulUniqueContext Int8 243143892 | |
15 helpfulAddressesClaim Int8 243143892 | |
16 helpfulImportantContext Int8 243143892 | |
17 helpfulUnbiasedLanguage Int8 243143892 | |
18 notHelpfulOther Int8 243143892 | |
19 notHelpfulIncorrect Int8 243143892 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 243143892 | |
21 notHelpfulOpinionSpeculationOrBias Int8 243143892 | |
22 notHelpfulMissingKeyPoints Int8 243143892 | |
23 notHelpfulOutdated Int8 243143892 | |
24 notHelpfulHardToUnderstand Int8 243143892 | |
25 notHelpfulArgumentativeOrBiased Int8 243143892 | |
26 notHelpfulOffTopic Int8 243143892 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 243143892 | |
28 notHelpfulIrrelevantSources Int8 243143892 | |
29 notHelpfulOpinionSpeculation Int8 243143892 | |
30 notHelpfulNoteNotNeeded Int8 243143892 | |
31 ratedOnTweetId int64 972575568 | |
32 helpfulNum float64 972575568 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 230089929 bytes (0.230 GB) | |
column dtype RAM | |
0 noteId int64 14269032 | |
1 noteAuthorParticipantId object 14269032 | |
2 createdAtMillis float64 14269032 | |
3 timestampMillisOfFirstNonNMRStatus float64 14269032 | |
4 firstNonNMRStatus category 1783753 | |
5 timestampMillisOfCurrentStatus float64 14269032 | |
6 currentStatus category 1783761 | |
7 timestampMillisOfLatestNonNMRStatus float64 14269032 | |
8 mostRecentNonNMRStatus category 1783753 | |
9 timestampMillisOfStatusLock float64 14269032 | |
10 lockedStatus category 1783761 | |
11 timestampMillisOfRetroLock float64 14269032 | |
12 currentCoreStatus category 1783761 | |
13 currentExpansionStatus category 1783761 | |
14 currentGroupStatus category 1783761 | |
15 currentDecidedBy category 1784377 | |
16 currentModelingGroup float64 14269032 | |
17 timestampMillisOfMostRecentStatusChange float64 14269032 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14269032 | |
19 currentMultiGroupStatus category 1783761 | |
20 currentModelingMultiGroup float64 14269032 | |
21 timestampMinuteOfFinalScoringOutput float64 14269032 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14269032 | |
23 classification object 14269032 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 60314631 bytes (0.060 GB) | |
column dtype RAM | |
0 participantId object 8465208 | |
1 enrollmentState object 8465208 | |
2 successfulRatingNeededToEarnIn int64 8465208 | |
3 timestampOfLastStateChange int64 8465208 | |
4 timestampOfLastEarnOut float64 8465208 | |
5 modelingPopulation category 1058175 | |
6 modelingGroup float64 8465208 | |
7 numberOfTimesEarnedOut int64 8465208 | |
INFO:birdwatch.constants:Logging Prescoring Inputs Initial RAM usage elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.constants:Get Note Topics: Prepare Post Text elapsed time: 17.03 secs (0.28 mins) | |
INFO:birdwatch.topic_model: Notes unassigned due to multiple matches: 1737 | |
INFO:birdwatch.constants:Get Note Topics: Make Seed Labels elapsed time: 83.71 secs (1.40 mins) | |
INFO:birdwatch.topic_model: Initial vocabulary length: 2211370 | |
INFO:birdwatch.topic_model: Total tokens to filter: 13 | |
INFO:birdwatch.topic_model: Total identified stopwords: 1720 | |
INFO:birdwatch.constants:Get Note Topics: Get Stop Words elapsed time: 88.33 secs (1.47 mins) | |
INFO:birdwatch.constants:Get Note Topics: Train Model elapsed time: 422.55 secs (7.04 mins) | |
INFO:birdwatch.topic_model:Assigning notes to topics: | |
INFO:birdwatch.constants:Get Note Topics: Predict elapsed time: 82.19 secs (1.37 mins) | |
INFO:birdwatch.topic_model: Balanced accuracy on raw predictions: 0.7085042641360098 | |
INFO:birdwatch.topic_model: Post Topic assignment results: [908706 26730 54332 2365] | |
INFO:birdwatch.topic_model: Note Topic assignment results: | |
noteTopic | |
GazaConflict 112514 | |
UkraineConflict 45735 | |
MessiRonaldo 4054 | |
Name: count, dtype: int64 | |
INFO:birdwatch.constants:Get Note Topics: Merge and assign predictions elapsed time: 1.84 secs (0.03 mins) | |
INFO:birdwatch.constants:Note Topic Assignment elapsed time: 712.33 secs (11.87 mins) | |
INFO:birdwatch.run_scoring:ratings summary before PSS: 6c81e0cfe5486d369403b28d15ed7c7d8fd7276d885f1f189f668f9c29f7ae6e | |
INFO:birdwatch.run_scoring:Post Selection Similarity Prescoring: begin with 121571946 ratings. | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py, in filter_ratings_by_post_selection_similarity, at line 85: ratings.merge( | |
PandasTypeError: Output mismatch on postSelectionValue: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py, in filter_ratings_by_post_selection_similarity, at line 85: ratings.merge( | |
PandasTypeError: Input mismatch on postSelectionValue: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Output mismatch on postSelectionValue_note_author: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:111: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.sort_values( | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:114: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.drop_duplicates( | |
INFO:birdwatch.run_scoring:Post Selection Similarity Prescoring: 120945188 ratings remaining. | |
INFO:birdwatch.constants:Filter ratings by Post Selection Similarity elapsed time: 296.48 secs (4.94 mins) | |
INFO:birdwatch.run_scoring:ratings summary after PSS: 2f7e79680fe98dc326fd2a959a33ea72580b7fede120bb12cdecb47f701dda4a | |
INFO:birdwatch.run_scoring:Error converting user IDs to ints. IDs will remain as strings. ValueError("invalid literal for int() with base 10: 'F35972BBD2F99515FD974E9C7AFD899970F2E4A59115132FAD59EBCB74C0ABE6'") | |
INFO:birdwatch.run_scoring:notes total RAM: 125118992 bytes (0.125 GB) | |
column dtype RAM | |
0 noteId int64 12670240 | |
1 noteAuthorParticipantId object 12670240 | |
2 createdAtMillis int64 12670240 | |
3 tweetId object 12670240 | |
4 classification object 12670240 | |
5 believable category 1583904 | |
6 harmful category 1583904 | |
7 validationDifficulty category 1583904 | |
8 misleadingOther Int8 3167560 | |
9 misleadingFactualError Int8 3167560 | |
10 misleadingManipulatedMedia Int8 3167560 | |
11 misleadingOutdatedInformation Int8 3167560 | |
12 misleadingMissingImportantContext Int8 3167560 | |
13 misleadingUnverifiedClaimAsFact Int8 3167560 | |
14 misleadingSatire Int8 3167560 | |
15 notMisleadingOther Int8 3167560 | |
16 notMisleadingFactuallyCorrect Int8 3167560 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3167560 | |
18 notMisleadingClearlySatire Int8 3167560 | |
19 notMisleadingPersonalOpinion Int8 3167560 | |
20 trustworthySources Int8 3167560 | |
21 summary object 12670240 | |
22 isMediaNote Int8 3167560 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13424916000 bytes (13.425 GB) | |
column dtype RAM | |
0 noteId int64 967561504 | |
1 raterParticipantId object 967561504 | |
2 createdAtMillis int64 967561504 | |
3 version Int8 241890376 | |
4 agree Int8 241890376 | |
5 disagree Int8 241890376 | |
6 helpful Int8 241890376 | |
7 notHelpful Int8 241890376 | |
8 helpfulnessLevel category 120945320 | |
9 helpfulOther Int8 241890376 | |
10 helpfulInformative Int8 241890376 | |
11 helpfulClear Int8 241890376 | |
12 helpfulEmpathetic Int8 241890376 | |
13 helpfulGoodSources Int8 241890376 | |
14 helpfulUniqueContext Int8 241890376 | |
15 helpfulAddressesClaim Int8 241890376 | |
16 helpfulImportantContext Int8 241890376 | |
17 helpfulUnbiasedLanguage Int8 241890376 | |
18 notHelpfulOther Int8 241890376 | |
19 notHelpfulIncorrect Int8 241890376 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 241890376 | |
21 notHelpfulOpinionSpeculationOrBias Int8 241890376 | |
22 notHelpfulMissingKeyPoints Int8 241890376 | |
23 notHelpfulOutdated Int8 241890376 | |
24 notHelpfulHardToUnderstand Int8 241890376 | |
25 notHelpfulArgumentativeOrBiased Int8 241890376 | |
26 notHelpfulOffTopic Int8 241890376 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 241890376 | |
28 notHelpfulIrrelevantSources Int8 241890376 | |
29 notHelpfulOpinionSpeculation Int8 241890376 | |
30 notHelpfulNoteNotNeeded Int8 241890376 | |
31 ratedOnTweetId int64 967561504 | |
32 helpfulNum float64 967561504 | |
33 postSelectionValue float64 967561504 | |
34 postSelectionValue_note_author float64 967561504 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 230089929 bytes (0.230 GB) | |
column dtype RAM | |
0 noteId int64 14269032 | |
1 noteAuthorParticipantId object 14269032 | |
2 createdAtMillis float64 14269032 | |
3 timestampMillisOfFirstNonNMRStatus float64 14269032 | |
4 firstNonNMRStatus category 1783753 | |
5 timestampMillisOfCurrentStatus float64 14269032 | |
6 currentStatus category 1783761 | |
7 timestampMillisOfLatestNonNMRStatus float64 14269032 | |
8 mostRecentNonNMRStatus category 1783753 | |
9 timestampMillisOfStatusLock float64 14269032 | |
10 lockedStatus category 1783761 | |
11 timestampMillisOfRetroLock float64 14269032 | |
12 currentCoreStatus category 1783761 | |
13 currentExpansionStatus category 1783761 | |
14 currentGroupStatus category 1783761 | |
15 currentDecidedBy category 1784377 | |
16 currentModelingGroup float64 14269032 | |
17 timestampMillisOfMostRecentStatusChange float64 14269032 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14269032 | |
19 currentMultiGroupStatus category 1783761 | |
20 currentModelingMultiGroup float64 14269032 | |
21 timestampMinuteOfFinalScoringOutput float64 14269032 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14269032 | |
23 classification object 14269032 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 60314631 bytes (0.060 GB) | |
column dtype RAM | |
0 participantId object 8465208 | |
1 enrollmentState object 8465208 | |
2 successfulRatingNeededToEarnIn int64 8465208 | |
3 timestampOfLastStateChange int64 8465208 | |
4 timestampOfLastEarnOut float64 8465208 | |
5 modelingPopulation category 1058175 | |
6 modelingGroup float64 8465208 | |
7 numberOfTimesEarnedOut int64 8465208 | |
INFO:birdwatch.constants:Logging Prescoring Inputs RAM usage before _run_scorers elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:Starting parallel scorer execution with 23 scorers. | |
Patching pandas | |
Pairs dict used 42.949673056GB RAM at max | |
Pairs dict used 42.949673056GB RAM after deleted unneeded pairs | |
SHELL: /bin/bash | |
PWD: /home/ubuntu/communitynotes/sourcecode | |
LOGNAME: ubuntu | |
XDG_SESSION_TYPE: tty | |
MOTD_SHOWN: pam | |
HOME: /home/ubuntu | |
LANG: C.UTF-8 | |
LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: | |
VIRTUAL_ENV: /home/ubuntu/communitynotes/.env | |
SSH_CONNECTION: 71.168.238.143 62078 172.31.29.67 22 | |
LESSCLOSE: /usr/bin/lesspipe %s %s | |
XDG_SESSION_CLASS: user | |
TERM: xterm-256color | |
LESSOPEN: | /usr/bin/lesspipe %s | |
USER: ubuntu | |
SHLVL: 0 | |
XDG_SESSION_ID: 1 | |
VIRTUAL_ENV_PROMPT: (.env) | |
XDG_RUNTIME_DIR: /run/user/1000 | |
PS1: (.env) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ | |
SSH_CLIENT: 71.168.238.143 62078 22 | |
XDG_DATA_DIRS: /usr/local/share:/usr/share:/var/lib/snapd/desktop | |
PATH: /home/ubuntu/communitynotes/.env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin | |
DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus | |
SSH_TTY: /dev/pts/0 | |
OLDPWD: /home/ubuntu/communitynotes | |
_: /home/ubuntu/communitynotes/.env/bin/python3 | |
KMP_INIT_AT_FORK: FALSE | |
KMP_DUPLICATE_LIB_OK: True | |
[Pipeline] .... (step 1 of 3) Processing UnigramEncoder, total= 1.4min | |
[Pipeline] ............. (step 2 of 3) Processing tfidf, total= 2.0s | |
[Pipeline] ........ (step 3 of 3) Processing Classifier, total= 5.6min | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:ReputationScorer run_scorer_parallelizable: Loading data elapsed time: 30.39 secs (0.51 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_12 run_scorer_parallelizable: Loading data elapsed time: 30.40 secs (0.51 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for ReputationScorer set to: 12 | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_12 set to: 4 | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_13 run_scorer_parallelizable: Loading data elapsed time: 31.34 secs (0.52 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_13 set to: 8 | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionPlusScorer run_scorer_parallelizable: Loading data elapsed time: 33.15 secs (0.55 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFExpansionPlusScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFCoreScorer run_scorer_parallelizable: Loading data elapsed time: 33.19 secs (0.55 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFCoreScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionScorer run_scorer_parallelizable: Loading data elapsed time: 33.36 secs (0.56 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFExpansionScorer set to: 12 | |
INFO:birdwatch.scorer:Filtering ratings for ReputationScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_12. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_13. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFCoreScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionPlusScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 787651 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filter input elapsed time: 47.57 secs (0.79 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_12: 0f90465b45e33ddc4e6ad35e4eb90f44f73f27066c6227a98a420d691f83317e | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 454147, Num Unique Notes Rated: 31788, Num Unique Raters: 6778 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Prepare ratings elapsed time: 0.30 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_12: 810b44a434074c6798e27cfb055f3129722e6549fd1448e74a0141bbe3c1c0c4 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_12: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_12: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6778, Notes: 31788 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.286743425191895 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 67.00309825907347 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.580998420715332 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.090514183044434 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3292791247367859 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26007941365242004 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15845359861850739 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12058430165052414 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11117659509181976 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07749021053314209 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10526006668806076 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07270617038011551 | |
INFO:birdwatch.scorer: Ratings after group filter: 35923731 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10454562306404114 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07203452289104462 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10445060580968857 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0719677060842514 | |
INFO:birdwatch.scorer: Ratings after group filter: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.104437917470932 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07196587324142456 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filter input elapsed time: 56.90 secs (0.95 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:Num epochs: 147 | |
INFO:birdwatch.matrix_factorization:epoch 147 0.10443714261054993 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07196403294801712 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18704961240291595 | |
INFO:birdwatch.scorer:MFGroupScorer_12 First MF/stable init elapsed time: 7.85 secs (0.13 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_12 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filter input elapsed time: 58.52 secs (0.98 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 104368644 | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scorer:ReputationScorer Filter input elapsed time: 67.01 secs (1.12 mins) | |
INFO:birdwatch.reputation_scorer:seeding with 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 104368644 | |
INFO:birdwatch.scorer: Ratings after group filter: 120942984 | |
INFO:birdwatch.scorer:MFCoreScorer Filter input elapsed time: 66.88 secs (1.11 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scorer:MFExpansionScorer Filter input elapsed time: 69.20 secs (1.15 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_13: b8a4d412a7493bc9e5797fdcf97b4251af03cae245b70c1f61201b0eba7a1f9e | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.79 secs (0.60 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Compute scored notes elapsed time: 43.65 secs (0.73 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 787531 post-tombstones and 120 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 623486, including 623486 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 41592 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Compute valid ratings elapsed time: 1.17 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Helpfulness scores pre-harassment elapsed time: 0.15 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6778 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 23120 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5936 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 5266 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 454147 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 384412 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filtering by helpfulness score elapsed time: 0.52 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 252520 | |
1 16324 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 115568 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 225641, Num Unique Notes Rated: 17556, Num Unique Raters: 4618 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 214444 | |
1 11197 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.049623073820803845 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 19.151915691703135 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4618, Notes: 17556 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 12.852642971064023 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 48.861195322650495 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.402225971221924 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4904992580413818 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7329317927360535 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.39929473400115967 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4511271119117737 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2659628987312317 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.41279053688049316 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2530340850353241 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.40753090381622314 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2509241998195648 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.40611767768859863 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2500147521495819 | |
INFO:birdwatch.matrix_factorization:Num epochs: 116 | |
INFO:birdwatch.matrix_factorization:epoch 116 0.4058106541633606 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.24984769523143768 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2988259494304657 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Harassment tag consensus elapsed time: 3.33 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Helpfulness scores post-harassment elapsed time: 0.23 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6778 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 23120 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5556 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4886 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 454147 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 319849 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4886, Notes: 31768 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.068276252833039 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 65.46234138354482 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37545326352119446 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3019244372844696 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10181743651628494 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06601089984178543 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10000446438789368 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06784423440694809 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09862642735242844 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06478048115968704 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09857840090990067 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06496739387512207 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0985577404499054 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06467646360397339 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09855896979570389 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06483707576990128 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18792936205863953 | |
INFO:birdwatch.constants:Final round MF elapsed time: 4.30 secs (0.07 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_12 prescoring, about to call diligence with 319849 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 001041D12A03F39CCB40BEA9458C469323254EEC76348B... -0.150372 | |
1 002A62303516D0CCE7BCBD143AE53FACB0FE03168AEA4E... 0.201533 | |
2 0037306269989273D720BBD181462AC844B31CB9003939... -0.263402 | |
3 0049F294210C39AE0E4AECF5FC2AC7FC51B7E09B968CC3... 0.116172 | |
4 00661AF4F42FD3F9F04048E1F668A3ADB341546490E117... 0.028281 | |
... ... ... | |
4881 FF9126CC43A7EAF83EA0D93F82BD392D8E20DFBA7E2C90... 0.031915 | |
4882 FFA492BC3E2F5B0DF00DC824605BC9FA92EB3DB63A4042... -0.464677 | |
4883 FFA9BCEF8D874B50FCC1914BB47BE36B2BCAD5EC1396CD... 0.516195 | |
4884 FFB689E24DF9F3E4E9DB93A95E13168392B1382A78C446... 0.012655 | |
4885 FFEEE02BCED1134EB1C57875779C03F2135B72BB4C8E7F... 0.547901 | |
[4886 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4886, vs. num we are initializing: 4886 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4886 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.751905 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.423125 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.969961 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.888476 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.860836 | time=3.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.846481 | time=3.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.837612 | time=4.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.831625 | time=5.3s | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 34911348, Num Unique Notes Rated: 619472, Num Unique Raters: 164261 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Prepare ratings elapsed time: 20.85 secs (0.35 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.827309 | time=6.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.824230 | time=6.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.821960 | time=7.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2406, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.821896 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.744053 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.734060 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.733243 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.733213 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.549105 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.476168 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.475229 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.475190 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.1678, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.8220, 1.7332, 0.4752 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Low Diligence MF elapsed time: 11.51 secs (0.19 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.30 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 102895565, Num Unique Notes Rated: 1227415, Num Unique Raters: 599301 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_13: ab6936ebf2558b11024124b8e08611af93010a3679bb847418ff84878ea6323f | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_13: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_13: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 37.09 secs (0.62 mins) | |
INFO:birdwatch.constants:MFGroupScorer_12: Compute tag thresholds for percentiles elapsed time: 0.79 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 164261, Notes: 619472 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 56.35661983108195 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 212.53583017271293 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.640757083892822 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.202917098999023 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFCoreScorer: 15e2c97fcaf014c5ae01f848c334cfab12909e7167710283188e965b667e1647 | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_11 run_scorer_parallelizable: Loading data elapsed time: 23.84 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_11 set to: 4 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFExpansionPlusScorer: 2f7e79680fe98dc326fd2a959a33ea72580b7fede120bb12cdecb47f701dda4a | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_11. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFExpansionScorer: af66915ace818d9099ed269e949212ce39c97e4e3ba8043b77d29542f9a73ef2 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.23661787807941437 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1908271461725235 | |
INFO:birdwatch.scorer: Ratings after group filter: 1761412 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filter input elapsed time: 49.46 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_11: 9ef0dff96d49113f31d12c854119a519c98b4e2541e9fb78b57b0a770bbcb812 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 153182, Notes: 125844 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 67.92280124598709 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 55.80079252131451 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1118331, Num Unique Notes Rated: 93288, Num Unique Raters: 8833 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Prepare ratings elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 6.598752498626709 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.155682563781738 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_11: e618bddc9deb549a78f04c0450c358ebf60a2163ffed5131f304f98211bf5d5a | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_11: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 102895565, Num Unique Notes Rated: 1227415, Num Unique Raters: 599301 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_11: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.scorer:MFCoreScorer Prepare ratings elapsed time: 60.26 secs (1.00 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 8833, Notes: 93288 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.987940571134551 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 126.60828710517379 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.586675643920898 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.107373237609863 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.318657249212265 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.23853722214698792 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.18262220919132233 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.14924296736717224 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.37755173444747925 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27198973298072815 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.12299879640340805 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0905485600233078 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10585077106952667 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07555652409791946 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10382040590047836 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07392773777246475 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10356181859970093 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07373917102813721 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10352914035320282 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0737101137638092 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1255543828010559 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0952415019273758 | |
INFO:birdwatch.matrix_factorization:Num epochs: 157 | |
INFO:birdwatch.matrix_factorization:epoch 157 0.10352528840303421 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07370838522911072 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1658429205417633 | |
INFO:birdwatch.scorer:MFGroupScorer_11 First MF/stable init elapsed time: 18.34 secs (0.31 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_11 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09576630592346191 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07156948745250702 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 119170712, Num Unique Notes Rated: 1321123, Num Unique Raters: 760595 | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare ratings elapsed time: 75.84 secs (1.26 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09140421450138092 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06861695647239685 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 119168555, Num Unique Notes Rated: 1321120, Num Unique Raters: 760575 | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare ratings elapsed time: 75.57 secs (1.26 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09092728793621063 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06821285933256149 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.19850678741931915 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.16542178392410278 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.0908546894788742 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06811489164829254 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.64 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Compute scored notes elapsed time: 43.24 secs (0.72 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09084469825029373 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0681382268667221 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1760882 post-tombstones and 530 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1437214, including 1437211 post-tombstones and 3 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 85848 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Compute valid ratings elapsed time: 2.87 secs (0.05 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.09084411710500717 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0681285411119461 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13832466304302216 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Helpfulness scores pre-harassment elapsed time: 0.32 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 8833 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 49542 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7740 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7161 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1118331 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 939057 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filtering by helpfulness score elapsed time: 1.33 secs (0.02 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 562980 | |
1 33536 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 342541 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 464358, Num Unique Notes Rated: 43936, Num Unique Raters: 6379 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 443734 | |
1 20624 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04441400815749917 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 21.51541892940264 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6379, Notes: 43936 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.568963947560087 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.79479542248001 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.245861530303955 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.3462748527526855 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6247559785842896 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2772906720638275 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.39446744322776794 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22249078750610352 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.36335206031799316 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2124040275812149 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.358895868062973 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21112506091594696 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3582291901111603 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21104197204113007 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.3582249879837036 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21109746396541595 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2964789867401123 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Harassment tag consensus elapsed time: 5.59 secs (0.09 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Helpfulness scores post-harassment elapsed time: 0.40 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 8833 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 49542 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7158 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 6579 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1118331 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 738296 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6579, Notes: 93093 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.93073593073593 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 112.22009423924608 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38544031977653503 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31715092062950134 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10276980698108673 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0691324844956398 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09985589236021042 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06993856281042099 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09861211478710175 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06738175451755524 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09854987263679504 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06729985028505325 | |
INFO:birdwatch.matrix_factorization:Num epochs: 99 | |
INFO:birdwatch.matrix_factorization:epoch 99 0.09853356331586838 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06713473051786423 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16698786616325378 | |
INFO:birdwatch.constants:Final round MF elapsed time: 9.04 secs (0.15 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_11 prescoring, about to call diligence with 738296 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00055253971F408A7AB80D461A543E010EC67DFAF29C45... -0.708740 | |
1 0007EFAB89EB0BCC18E8994B141F291F33C9CB80B9332E... 0.241762 | |
2 000E374F324AEBE8A92439EEC0C3DDE191F293CEF88509... 0.242744 | |
3 001496B1846E8D6B3857F889E75BE6CCB011824EFE36A0... -0.413925 | |
4 00344750A59D7A18770EA50D916A39A1D84ABA1E40CC59... 0.154371 | |
... ... ... | |
6574 FFBD7465A1175CF9CC7D37B2DB9689BA6469FD38417350... -0.045595 | |
6575 FFC1E16D320BD9589C96893BD161C6F9FDE5FC3C7C2D8E... -0.524984 | |
6576 FFC83F58410624DF16CD78060076B6070F13ACA978E417... -0.314590 | |
6577 FFE852866BE827C0D92EAC6FC2A68007E79120FD605090... -0.424833 | |
6578 FFFA49720F254411E1F79CA757C403F0A0217240BC4922... 0.458012 | |
[6579 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6579, vs. num we are initializing: 6579 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 6579 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.887066 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.349328 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.898270 | time=3.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.802666 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.767455 | time=6.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.748958 | time=8.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.737531 | time=10.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.729917 | time=11.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.724682 | time=13.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.721061 | time=15.3s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11744123697280884 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09191382676362991 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.718465 | time=17.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.1548, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.718395 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.602709 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.592395 | time=3.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.591535 | time=5.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.591504 | time=5.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.515296 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.454399 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.453595 | time=1.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.453562 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6380, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.7185, 1.5915, 0.4536 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Low Diligence MF elapsed time: 25.85 secs (0.43 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.98 secs (0.02 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFCoreScorer: dc8f69b3062c675cad659c91fdea98daf4cf2c5c86a315d63225e20cb9aa5b92 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFCoreScorer: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFCoreScorer: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.78 secs (0.56 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10625919699668884 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08128134161233902 | |
INFO:birdwatch.constants:MFGroupScorer_11: Compute tag thresholds for percentiles elapsed time: 1.89 secs (0.03 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFExpansionPlusScorer: f7176c93f2dfaf9c4d69cf2fa77d27c14f3ba5fce0400d8046b9e53279dc779c | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFExpansionPlusScorer: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFExpansionPlusScorer: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_10 run_scorer_parallelizable: Loading data elapsed time: 22.38 secs (0.37 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_10 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteFactor1 | |
0 1354933402240229380 -0.112242 | |
1 1357798998405447682 -0.502345 | |
2 1360871260054503427 0.302415 | |
3 1361842531655376899 0.320207 | |
4 1362121547511521284 0.337773 | |
... ... ... | |
125839 1875370284783693963 -0.055580 | |
125840 1649508090755317761 0.386110 | |
125841 1819976965786669296 0.268356 | |
125842 1645870102506622976 -0.198711 | |
125843 1642576751309062145 -0.045856 | |
[125844 rows x 2 columns], | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... -0.767417 | |
1 9D41130B60D66BCC6FAA1115676546405A37F3BC90991F... -0.756215 | |
2 EBDCB80B1EC4A9FB51C8A562377D72F9569692DEFFC8BC... -0.775990 | |
3 E23374E04DD1B97ED5E4BE68F56CD25AE5DE53DD2A3541... -0.407247 | |
4 60D2AB8839D3EF47DD1C377DD8246EBA76ECB17DD65F13... -0.536566 | |
... ... ... | |
153177 D63468CC312C3BDBF75AC159934BFC855910C255706F7A... 0.234293 | |
153178 9807DA2C5AE0CAD796716CE294B7C2B934961C61D93F10... -0.091149 | |
153179 9127AA4782D5685D5D752EFD4C36A5AEFDF381565E5625... 0.008951 | |
153180 10D6AE831984739AF8414859A6F358AFDAABFB17645AE3... -0.173865 | |
153181 EF2BD2B99FF12B9E306EA110DA6D535C380DC6705AB784... 0.000395 | |
[153182 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 599301, vs. num we are initializing: 153182 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 446119 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 153182 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1227415, vs. num we are initializing: 125844 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFExpansionScorer: 5234742b79cb47b4ee11ced73bf98174f16682c5d38bd9376b73109d8240ac76 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1210846 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 16569 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFExpansionScorer: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFExpansionScorer: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_10. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.119550 | time=0.8s | |
INFO:birdwatch.scorer:MFCoreScorer Prepare data for stable initialization elapsed time: 63.43 secs (1.06 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 153182, Notes: 125844 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 67.92280124598709 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 55.80079252131451 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.598752498626709 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.155682563781738 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.37755173444747925 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27198973298072815 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10486427694559097 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08077707141637802 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10486427694559097 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08077707141637802 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.15291544795036316 | |
INFO:birdwatch.scorer:MFGroupScorer_13 First MF/stable init elapsed time: 296.73 secs (4.95 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_13 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1255543828010559 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0952415019273758 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09576630592346191 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07156948745250702 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1008102 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filter input elapsed time: 51.23 secs (0.85 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_10: 66205f3b8debe8396edcd5e3841d972efb5fc6600cbb9478ee18ec8576e21804 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09140421450138092 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06861695647239685 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 498855, Num Unique Notes Rated: 44531, Num Unique Raters: 6286 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Prepare ratings elapsed time: 0.34 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_10: 85af8802cea96424b506383f4c5c13bf4ec2b020a2c759c0724746fe62d0c000 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_10: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_10: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6286, Notes: 44531 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.2024207855202 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 79.3596881959911 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.574841022491455 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.087750434875488 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3223755955696106 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2485727220773697 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.14334002137184143 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10413645207881927 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10707332193851471 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0736645832657814 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10193940252065659 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06991449743509293 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10126650333404541 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06940615922212601 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10117784142494202 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06933258473873138 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09092728793621063 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06821285933256149 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10116661339998245 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06932461261749268 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.10116579383611679 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06932135671377182 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1761155128479004 | |
INFO:birdwatch.scorer:MFGroupScorer_10 First MF/stable init elapsed time: 8.15 secs (0.14 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_10 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.0908546894788742 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06811489164829254 | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare data for stable initialization elapsed time: 81.33 secs (1.36 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09084469825029373 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0681382268667221 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 146965, Notes: 104186 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 63.31568540878813 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 44.88557139454972 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.599421501159668 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.155450344085693 | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.09084411710500717 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0681285411119461 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13832466304302216 | |
INFO:birdwatch.scorer:MFCoreScorer MF on stable-initialization subset elapsed time: 71.82 secs (1.20 mins) | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare data for stable initialization elapsed time: 77.78 secs (1.30 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.35630518198013306 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29604703187942505 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 146966, Notes: 104188 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 63.31675432871348 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 44.886885402065786 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.60595703125 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.162001132965088 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.12139009684324265 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09409978240728378 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3037235140800476 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.24657271802425385 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09513632208108902 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07131683826446533 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11647383868694305 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08917077630758286 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.77 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Compute scored notes elapsed time: 78.88 secs (1.31 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 38.46 secs (0.64 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Compute scored notes elapsed time: 47.93 secs (0.80 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09121275693178177 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06829170882701874 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1007765 post-tombstones and 337 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 810566, including 810566 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 48987 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Compute valid ratings elapsed time: 1.48 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Helpfulness scores pre-harassment elapsed time: 0.21 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09450153261423111 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07094471901655197 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6286 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 31122 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5807 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 5143 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 498855 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 423387 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filtering by helpfulness score elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 269073 | |
1 16016 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 138298 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 215406, Num Unique Notes Rated: 19523, Num Unique Raters: 4421 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 206414 | |
1 8992 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04174442680333881 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 22.95529359430605 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4421, Notes: 19523 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.033447728320443 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 48.72336575435422 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.2857961654663086 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.379257321357727 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6208410263061523 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2738921046257019 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.39983925223350525 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2223479449748993 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.36844533681869507 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.211097851395607 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.36397483944892883 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21021240949630737 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.36326712369918823 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21010172367095947 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.3631865382194519 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21013574302196503 | |
INFO:birdwatch.matrix_factorization:Num epochs: 129 | |
INFO:birdwatch.matrix_factorization:epoch 129 0.3631788194179535 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21012674272060394 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.3147020637989044 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Harassment tag consensus elapsed time: 3.80 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Helpfulness scores post-harassment elapsed time: 0.27 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6286 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 31122 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5471 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4807 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 498855 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 348357 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09067538380622864 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06799691170454025 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4807, Notes: 44436 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.839522009181745 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.46869149157479 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3736807107925415 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30196502804756165 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09801459312438965 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06264984607696533 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09106601774692535 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06829490512609482 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09619583934545517 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06445912271738052 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.0948190987110138 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06150089204311371 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09477508068084717 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061673518270254135 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09475497901439667 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06141379475593567 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09475598484277725 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06156698241829872 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1765308380126953 | |
INFO:birdwatch.constants:Final round MF elapsed time: 4.98 secs (0.08 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_10 prescoring, about to call diligence with 348357 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000D424F8BBD591A0725D5F6F54F78C50C8DC591637C0E... 0.043849 | |
1 002ADDCBF2E4A2F363B766024F866D803ED65C8AF3759C... -0.576816 | |
2 0033B06B2B9E22875E057C84D99E2634127C4291A081B4... -0.279149 | |
3 003FDF9A655454DDED55D10DDC81830B57A59BEED1847D... 0.451751 | |
4 004FF8092304B71DF706338FA263DCACD3EE439A34C930... -0.640659 | |
... ... ... | |
4802 FFA25730921C4BBCBDFFFBB55A28CDB67BD00A30F74FEF... 0.825386 | |
4803 FFB14685679DE209BD2EB051060B796657AE6158314F58... -0.554702 | |
4804 FFC6993701C48435AB714C158FFD8420268574F35A55EE... -0.072824 | |
4805 FFE9E0E39C0049AD113CEF0AB5178393F13B15C4E7B31C... -0.081506 | |
4806 FFF104BC8D2B5E53432FF3E605B5D5D76EDECE29AFA0F5... 0.593214 | |
[4807 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4807, vs. num we are initializing: 4807 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4807 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.998718 | time=0.0s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.267973 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.826852 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.734086 | time=2.8s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09061166644096375 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06794039160013199 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.699100 | time=3.7s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09066429734230042 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06797884404659271 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.680363 | time=4.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.668867 | time=5.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.661324 | time=6.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.656229 | time=7.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.652678 | time=8.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.650150 | time=9.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.2922, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.650073 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.551369 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.542536 | time=1.7s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09060333669185638 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06792472302913666 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.541726 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.541695 | time=2.9s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.537292 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.474567 | time=0.5s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09060990810394287 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06794218719005585 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.473745 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.473718 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.4797, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6502, 1.5417, 0.4737 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Low Diligence MF elapsed time: 13.70 secs (0.23 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 147 | |
INFO:birdwatch.matrix_factorization:epoch 147 0.09060263633728027 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06792367994785309 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13638591766357422 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer MF on stable-initialization subset elapsed time: 60.20 secs (1.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.016365 | time=135.3s | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 140 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09060298651456833 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06791942566633224 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13668499886989594 | |
INFO:birdwatch.scorer:MFExpansionScorer MF on stable-initialization subset elapsed time: 57.30 secs (0.96 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.note_ratings:Total ratings: 35709567 post-tombstones and 214164 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 29097883, including 29032087 post-tombstones and 65796 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 1397716 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Compute valid ratings elapsed time: 50.33 secs (0.84 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Helpfulness scores pre-harassment elapsed time: 1.98 secs (0.03 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 599301, Notes: 1227415 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 83.83111254139799 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 171.69263024757174 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.25071707367897034 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2179812788963318 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 38.70 secs (0.65 mins) | |
INFO:birdwatch.constants:MFGroupScorer_10: Compute tag thresholds for percentiles elapsed time: 1.03 secs (0.02 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 164261 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 226673 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 121082 | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_9 run_scorer_parallelizable: Loading data elapsed time: 23.36 secs (0.39 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_9 set to: 4 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 114140 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 34911348 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 26402896 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filtering by helpfulness score elapsed time: 50.34 secs (0.84 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_9. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 14899456 | |
1 1457784 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 9975530 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 15918931, Num Unique Notes Rated: 456513, Num Unique Raters: 109032 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 14513353 | |
1 1405578 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.08829600429827857 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 10.325540809545966 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 760595, Notes: 1321123 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 90.20410060229062 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 156.68090376613046 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 109032, Notes: 456513 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 34.87070685829319 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 146.00237544940936 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 760575, Notes: 1321120 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.matrix_factorization:epoch 0 0.24722672998905182 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21597912907600403 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 90.2026727322272 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 156.68218781842685 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.200676441192627 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.3540019989013672 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.24721327424049377 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21593815088272095 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 5652192 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filter input elapsed time: 57.42 secs (0.96 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_9: c4fef79686c2ee4fd68ac275c54aeb5d1e32282bc7f3b660b30926aa178dbb0a | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4951595, Num Unique Notes Rated: 160526, Num Unique Raters: 40560 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Prepare ratings elapsed time: 3.13 secs (0.05 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6822740435600281 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.39618825912475586 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.015030 | time=274.9s | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_9: 98a718ca9f1d69e27f48dece344405735d05694ecf4b4975bf6634d5dda3ca16 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_9: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_9: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 40560, Notes: 160526 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 30.8460623201226 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 122.08074457593689 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.183657646179199 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.728916645050049 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3331329822540283 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2727791965007782 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1216716542840004 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09572986513376236 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.44605863094329834 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31202468276023865 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13334016501903534 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09765860438346863 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10470187664031982 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07548728585243225 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10087470710277557 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.072791688144207 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.41267022490501404 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2986799478530884 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10039913654327393 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07229926437139511 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.1003374308347702 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07221179455518723 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4079984128475189 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2966216206550598 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10032914578914642 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07223789393901825 | |
INFO:birdwatch.matrix_factorization:Num epochs: 144 | |
INFO:birdwatch.matrix_factorization:epoch 144 0.10032880306243896 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0722227394580841 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16858090460300446 | |
INFO:birdwatch.scorer:MFGroupScorer_9 First MF/stable init elapsed time: 101.77 secs (1.70 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_9 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12331050634384155 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09613346308469772 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.40733471512794495 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2962554395198822 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1233014315366745 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09613857418298721 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.62 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Compute scored notes elapsed time: 47.17 secs (0.79 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 5651228 post-tombstones and 964 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4500014, including 4500009 post-tombstones and 5 pre-tombstones. | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11222019046545029 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08670960366725922 | |
INFO:birdwatch.note_ratings:Total valid ratings: 366137 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Compute valid ratings elapsed time: 6.25 secs (0.10 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Helpfulness scores pre-harassment elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.407224178314209 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2962098717689514 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.014969 | time=433.3s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 40560 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 90695 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 34739 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 32473 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4951595 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4189292 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filtering by helpfulness score elapsed time: 6.82 secs (0.11 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2751553 | |
1 188174 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1249565 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2783522, Num Unique Notes Rated: 105290, Num Unique Raters: 31066 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2618713 | |
1 164809 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05920880093636767 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 15.88938104108392 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 31066, Notes: 105290 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 26.436717637002566 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 89.6002703920685 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.3624095916748047 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4721662998199463 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7069790959358215 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.4047195315361023 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.45554566383361816 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2976972162723541 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.40719953179359436 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2962021827697754 | |
INFO:birdwatch.matrix_factorization:Num epochs: 141 | |
INFO:birdwatch.matrix_factorization:epoch 141 0.40719953179359436 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2962021827697754 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20349323749542236 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Harassment tag consensus elapsed time: 260.28 secs (4.34 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.41886022686958313 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2849714159965515 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Helpfulness scores post-harassment elapsed time: 6.19 secs (0.10 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4139789342880249 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2824951708316803 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4132935404777527 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2821630835533142 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4131905734539032 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2821492552757263 | |
INFO:birdwatch.matrix_factorization:Num epochs: 127 | |
INFO:birdwatch.matrix_factorization:epoch 127 0.41318556666374207 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28213831782341003 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.22360152006149292 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Harassment tag consensus elapsed time: 50.05 secs (0.83 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Helpfulness scores post-harassment elapsed time: 1.05 secs (0.02 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 40560 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 90695 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 30820 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 28554 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4951595 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3195238 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 28554, Notes: 160377 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.923293240302538 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 111.90158996988163 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3980688452720642 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33062678575515747 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 164261 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 226673 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 111397 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10257433354854584 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0703306794166565 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09863606840372086 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07039089500904083 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 104455 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 34911348 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 19428361 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09800463169813156 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06947032362222672 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09782175719738007 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0685548186302185 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11341414600610733 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08804556727409363 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 104455, Notes: 618457 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 31.414247069723523 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 185.99742472835192 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09781087934970856 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06855683773756027 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.09781087934970856 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06855683773756027 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.171379953622818 | |
INFO:birdwatch.constants:Final round MF elapsed time: 47.95 secs (0.80 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_9 prescoring, about to call diligence with 3195238 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 0 0.4043160080909729 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.34436196088790894 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... -0.628695 | |
1 000415A1E3D1DA95BD626E1D938E4A9AFFB446D1A7D532... 0.673919 | |
2 00041B33023A7D5BCE252803A32E50E9AFCC1584F63ED4... -0.252844 | |
3 0005FD5ECF92B548D17E663347D5E696806076F75457A1... -0.361711 | |
4 000929DF3AFDB652A896FC0BA7FF91D9FBF4F3214D8392... -0.504606 | |
... ... ... | |
28549 FFF69B7E7ACFBB1E413F8B85384A9EB245A8D8B85F76C9... 0.016865 | |
28550 FFF771FF9CA763466ADA4DA853867E7371DEE6D71C50CB... -0.342952 | |
28551 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... 0.480560 | |
28552 FFFEB3E291D915645E08FD13A9BFE66B5912FE45306D25... -0.323640 | |
28553 FFFF8C877BDC3CEFEFD0D4C5F0E8B4BE537F5023A1F31F... -0.518907 | |
[28554 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28554, vs. num we are initializing: 28554 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 28554 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.627953 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.549652 | time=8.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.134783 | time=16.2s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11096317321062088 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08592578023672104 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.074007 | time=26.5s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11341163516044617 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08804169297218323 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.058036 | time=34.5s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10835500061511993 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07922337204217911 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.014966 | time=581.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.051465 | time=42.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.047912 | time=50.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.045682 | time=59.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.044106 | time=67.8s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10419486463069916 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07894574105739594 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.043009 | time=78.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.042234 | time=86.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7144, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.042212 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=0.014966 | time=634.5s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.4173, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.172663 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.865350 | time=7.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.854708 | time=15.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.853917 | time=22.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.853892 | time=25.2s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.408889 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10389487445354462 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0786510705947876 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.337088 | time=4.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.336210 | time=8.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.336173 | time=10.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8723, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0422, 1.8539, 0.3362 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Low Diligence MF elapsed time: 125.71 secs (2.10 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11200868338346481 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08672810345888138 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.103670135140419 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0776614174246788 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11079816520214081 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0856325626373291 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.64 secs (0.58 mins) | |
INFO:birdwatch.constants:MFGroupScorer_9: Compute tag thresholds for percentiles elapsed time: 7.77 secs (0.13 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1120082437992096 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08672945946455002 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10364771634340286 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07776617258787155 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10364771634340286 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07776617258787155 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1562773585319519 | |
INFO:birdwatch.constants:Final round MF elapsed time: 260.32 secs (4.34 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_13 prescoring, about to call diligence with 19428361 final round ratings. | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_8 run_scorer_parallelizable: Loading data elapsed time: 25.51 secs (0.43 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_8 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.634952 | |
1 00022C96980039352E2D04B5E533090FA8BA333F87C5EB... -0.187200 | |
2 0002CA11E7127598E26C281F887129ADA2623C82BBCE8F... -0.431222 | |
3 00043CBC4A8DCE4003E776DCD459F07595B529D190FE6A... -0.583918 | |
4 0006A0E14304DF01B1004C185280BD0429F985BC9BA3BE... -0.030831 | |
... ... ... | |
104450 FFFE3F2AD0851826664EA471BEA111C1EF31AD64EC79A8... -0.468143 | |
104451 FFFE47B0979CC079B88D01EEBB42203E78DD1CC8115671... 0.051052 | |
104452 FFFE83C62E7D3E361E85273D9A8BC1D7D206AF97FAA90E... -0.065814 | |
104453 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... -0.660024 | |
104454 FFFF7E0B3ADB6FC5FB42B0F01FFD24495410C1AE4AC986... -0.025618 | |
[104455 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 104455, vs. num we are initializing: 104455 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 104455 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.207840 | time=0.4s | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_8. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.092598 | time=157.5s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.845192 | time=40.4s | |
INFO:birdwatch.scorer: Ratings after group filter: 769482 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filter input elapsed time: 49.24 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_8: 640be1b461914c417cb9bfe043a6b482dd9657f1d05ec5f743561e26bb4b3cd5 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 282615, Num Unique Notes Rated: 33822, Num Unique Raters: 3359 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Prepare ratings elapsed time: 0.22 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_8: 21aff49145ffc6c6823623d32dc867cb7d92e23b2daca12a5e21b950438ea579 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_8: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_8: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3359, Notes: 33822 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.35595174738336 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 84.13664781184876 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.674127578735352 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.181101322174072 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.34746503829956055 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2549097239971161 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15791937708854675 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11638370901346207 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10570421814918518 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0701366737484932 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09876681864261627 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0659034475684166 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09791554510593414 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06515109539031982 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09780553728342056 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0650898814201355 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09779118001461029 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06507830321788788 | |
INFO:birdwatch.matrix_factorization:Num epochs: 149 | |
INFO:birdwatch.matrix_factorization:epoch 149 0.09779003262519836 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06507452577352524 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1738341599702835 | |
INFO:birdwatch.scorer:MFGroupScorer_8 First MF/stable init elapsed time: 5.48 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_8 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.18 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.86 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.81 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.427992 | time=80.3s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1107766330242157 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08563681691884995 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11184670031070709 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08652419596910477 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.05 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Compute scored notes elapsed time: 43.22 secs (0.72 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 769257 post-tombstones and 225 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 615393, including 615393 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 23143 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Compute valid ratings elapsed time: 1.23 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Helpfulness scores pre-harassment elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3359 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22924 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3088 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2814 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 282615 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 257622 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filtering by helpfulness score elapsed time: 0.33 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 156339 | |
1 10974 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 90309 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 110666, Num Unique Notes Rated: 14291, Num Unique Raters: 2197 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 105252 | |
1 5414 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04892198145771962 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 19.440709272257113 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2197, Notes: 14291 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.743754810720033 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 50.37141556668184 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.176023006439209 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.2762701511383057 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6277109384536743 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2770954966545105 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4019104242324829 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2206452488899231 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.3705410361289978 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21070684492588043 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3660554587841034 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20878325402736664 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3654824495315552 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20870225131511688 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.36539945006370544 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20868779718875885 | |
INFO:birdwatch.matrix_factorization:Num epochs: 131 | |
INFO:birdwatch.matrix_factorization:epoch 131 0.3653915822505951 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20870687067508698 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.33355405926704407 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Harassment tag consensus elapsed time: 2.71 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Helpfulness scores post-harassment elapsed time: 0.23 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3359 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22924 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 2951 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2677 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 282615 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 226464 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2677, Notes: 33816 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 6.69694819020582 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 84.59618976466193 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3799591064453125 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30798590183258057 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.0965469628572464 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06042271852493286 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.0937800258398056 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06146633252501488 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09261040389537811 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05911112204194069 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.0925624668598175 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05906740576028824 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09254294633865356 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05889003351330757 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09254394471645355 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05901329964399338 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17319533228874207 | |
INFO:birdwatch.constants:Final round MF elapsed time: 3.27 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_8 prescoring, about to call diligence with 226464 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000A0CE0A7410288C107822B15D2B35C5E95715EA946E7... -0.768037 | |
1 0026E9A04A48A9CF87EA5FA9499883B8868F322F089686... 0.371154 | |
2 002E8C0F3F6321C14A72393D1A7CB72049853C81110CAA... -0.482728 | |
3 004DF35A540C1F2CFC12C89E8F0CA622480A4F0A52123C... 0.345110 | |
4 00506BFAD47756108668671B68A5FCCA78046636D92B76... -0.499084 | |
... ... ... | |
2672 FED95DED03904E345304807B78EB74EC32438A0C50F717... 0.296113 | |
2673 FF97899D2A4EEDBDCD42BA1004D5D696AD069094217867... -0.366838 | |
2674 FF98EA5358D2281496E24195141FA88EB6337C53188146... -0.533170 | |
2675 FFA64E61F9B012016BB7ACCFE2FF2E42D57BB570E94452... 0.774155 | |
2676 FFAA122DB59243500CA1C39E0536AAA151881CBD989683... 0.514111 | |
[2677 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2677, vs. num we are initializing: 2677 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2677 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.891335 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.225421 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.777663 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.672046 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.629744 | time=3.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.606175 | time=4.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.591590 | time=4.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.581974 | time=5.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.575238 | time=6.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.570729 | time=6.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.567490 | time=7.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.4778, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.567406 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.447663 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.438728 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.437749 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.437749 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.550993 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.492086 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.491305 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.491274 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.1921, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.5675, 1.4377, 0.4913 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Low Diligence MF elapsed time: 11.06 secs (0.18 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.368680 | time=122.7s | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11184630542993546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08652402460575104 | |
INFO:birdwatch.matrix_factorization:Num epochs: 109 | |
INFO:birdwatch.matrix_factorization:epoch 109 0.11077479273080826 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0856049582362175 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1622111052274704 | |
INFO:birdwatch.scorer:MFCoreScorer First full MF (initializated with stable-initialization) elapsed time: 820.56 secs (13.68 mins) | |
INFO:birdwatch.scorer:MFCoreScorer First MF/stable init elapsed time: 958.83 secs (15.98 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFCoreScorer | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.353914 | time=159.5s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.61 secs (0.59 mins) | |
INFO:birdwatch.constants:MFGroupScorer_8: Compute tag thresholds for percentiles elapsed time: 0.67 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.090662 | time=309.4s | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_7 run_scorer_parallelizable: Loading data elapsed time: 25.08 secs (0.42 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_7 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_7. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.348665 | time=195.9s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.346341 | time=231.3s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11182431876659393 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08652293682098389 | |
INFO:birdwatch.scorer: Ratings after group filter: 1789121 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filter input elapsed time: 48.74 secs (0.81 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_7: 3be852fd803cd63aa546567b930534827aa7706cea3a41b924d294888b8f9eaa | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1217225, Num Unique Notes Rated: 81330, Num Unique Raters: 16657 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Prepare ratings elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_7: 30bc51aebf6295e69d41c0c9131e662f75164de20643ff1fe957909bad119df7 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_7: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_7: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 16657, Notes: 81330 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.966494528464281 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 73.07588401272739 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.313928604125977 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.839940071105957 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3464145064353943 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.25356990098953247 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15121066570281982 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11476104706525803 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.12070568650960922 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08980465680360794 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.1162211075425148 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08611352741718292 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1156323105096817 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08556249737739563 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11555551737546921 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08548048883676529 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.11554571986198425 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08547631651163101 | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.11554515361785889 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08547690510749817 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1797574758529663 | |
INFO:birdwatch.scorer:MFGroupScorer_7 First MF/stable init elapsed time: 21.25 secs (0.35 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_7 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11182397603988647 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08652282506227493 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.345118 | time=266.9s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:Num epochs: 109 | |
INFO:birdwatch.matrix_factorization:epoch 109 0.11182256788015366 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08653662353754044 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16366994380950928 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer First full MF (initializated with stable-initialization) elapsed time: 918.10 secs (15.30 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer First MF/stable init elapsed time: 1062.86 secs (17.71 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFExpansionPlusScorer | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.344395 | time=302.5s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.67 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Compute scored notes elapsed time: 44.10 secs (0.73 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1788761 post-tombstones and 360 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1352787, including 1352784 post-tombstones and 3 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 140824 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Compute valid ratings elapsed time: 2.28 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Helpfulness scores pre-harassment elapsed time: 0.32 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 16657 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 57414 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 15748 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 13465 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1217225 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 1033491 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filtering by helpfulness score elapsed time: 1.53 secs (0.03 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 669357 | |
1 54086 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 310048 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 615954, Num Unique Notes Rated: 45054, Num Unique Raters: 11959 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 578763 | |
1 37191 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06037950885942781 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 15.561910139549893 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 11959, Notes: 45054 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 13.671460913570382 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 51.5054770465758 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4685802459716797 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5441439151763916 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.090563 | time=437.6s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6863232851028442 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.379958838224411 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.49561065435409546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31935709714889526 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.46877503395080566 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3088076114654541 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4651464521884918 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3074062168598175 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4646463990211487 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3072471022605896 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4645857810974121 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30719080567359924 | |
INFO:birdwatch.matrix_factorization:Num epochs: 121 | |
INFO:birdwatch.matrix_factorization:epoch 121 0.4645857810974121 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30719080567359924 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.26841995120048523 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Harassment tag consensus elapsed time: 9.06 secs (0.15 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Helpfulness scores post-harassment elapsed time: 0.47 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 16657 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 57414 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 14801 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 12518 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1217225 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 883910 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 12518, Notes: 81273 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.875813615837979 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 70.6111199872184 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37229666113853455 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.302815318107605 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1123301312327385 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08000205457210541 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11032749712467194 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08068172633647919 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.65 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFCoreScorer Compute scored notes elapsed time: 186.37 secs (3.11 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10872460156679153 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07705345749855042 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10863447189331055 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07750576734542847 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10859868675470352 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07724955677986145 | |
INFO:birdwatch.matrix_factorization:Num epochs: 102 | |
INFO:birdwatch.matrix_factorization:epoch 102 0.10860097408294678 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07731068134307861 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18187516927719116 | |
INFO:birdwatch.constants:Final round MF elapsed time: 11.54 secs (0.19 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_7 prescoring, about to call diligence with 883910 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... 0.026208 | |
1 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... 0.233759 | |
2 0009FC5E666A87A24C6E0A4F985A0F8128DE237BBB6D7B... 0.334185 | |
3 000F1687C56AB92D846F2B9BFA71AE16D8A88426754E3B... 0.677492 | |
4 001A43AFF5E78A3B5614DE48850B68332B26557D0B6904... 0.075942 | |
... ... ... | |
12513 FFE9CF3FC6CEBF09A2748F1A977245A86BE16A74850C3F... -0.081976 | |
12514 FFEAF4A561DFA90006C71904FB176E3BA20BF932ED1AE6... -0.140429 | |
12515 FFED9EACB703DDAE2E9BBF2B5A7FC35065AB055878F50D... 0.377575 | |
12516 FFEF7AD019F0E1EE28157E1298D5469164E8D7AF2CA91D... -0.251927 | |
12517 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... 0.356822 | |
[12518 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12518, vs. num we are initializing: 12518 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 12518 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.615967 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.510729 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.343950 | time=337.8s | |
INFO:birdwatch.matrix_factorization:Num epochs: 112 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.098109 | time=4.1s | |
INFO:birdwatch.matrix_factorization:epoch 112 0.11182186007499695 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08651027083396912 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16398383677005768 | |
INFO:birdwatch.scorer:MFExpansionScorer First full MF (initializated with stable-initialization) elapsed time: 960.81 secs (16.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.018143 | time=6.2s | |
INFO:birdwatch.scorer:MFExpansionScorer First MF/stable init elapsed time: 1100.25 secs (18.34 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.990027 | time=8.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.975597 | time=10.4s | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFExpansionScorer | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.966304 | time=12.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.960111 | time=14.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.955841 | time=16.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.952745 | time=18.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.950481 | time=20.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2645, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.950412 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.871368 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.860657 | time=4.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.859680 | time=6.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=1.859605 | time=7.1s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.548758 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.469256 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.468250 | time=2.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.468209 | time=3.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.2567, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.9505, 1.8596, 0.4682 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Low Diligence MF elapsed time: 32.23 secs (0.54 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.343662 | time=371.7s | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0388, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.343654 | time=0.3s | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.100843 | time=33.1s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.10 secs (0.57 mins) | |
INFO:birdwatch.constants:MFGroupScorer_7: Compute tag thresholds for percentiles elapsed time: 2.13 secs (0.04 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_6 run_scorer_parallelizable: Loading data elapsed time: 23.50 secs (0.39 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_6 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.090623 | time=66.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.090551 | time=564.9s | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_6. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.note_ratings:Total ratings: 104136031 post-tombstones and 232613 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 82236749, including 82170818 post-tombstones and 65931 pre-tombstones. | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.15 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.81 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.089773 | time=98.7s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.089745 | time=109.4s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.339131 | time=0.3s | |
INFO:birdwatch.scorer: Ratings after group filter: 5575831 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filter input elapsed time: 50.17 secs (0.84 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_6: cd5f217b9de24f80a9b1b861baef67f7f2b94eff240e84889055d9e67c6cb79c | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4770628, Num Unique Notes Rated: 213405, Num Unique Raters: 31721 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Prepare ratings elapsed time: 2.57 secs (0.04 mins) | |
INFO:birdwatch.note_ratings:Total valid ratings: 5640836 | |
INFO:birdwatch.scorer:MFCoreScorer Compute valid ratings elapsed time: 170.12 secs (2.84 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_6: 1d3ec34217c42287ae2ac660322b05bfa1acf80e51a5312066151d0ea1bb1b46 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_6: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_6: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.83 secs (0.58 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.260010 | time=21.7s | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Compute scored notes elapsed time: 208.42 secs (3.47 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 31721, Notes: 213405 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 22.354808931374617 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 150.39336717001356 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 0 6.270805358886719 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.814360618591309 | |
INFO:birdwatch.scorer:MFCoreScorer Helpfulness scores pre-harassment elapsed time: 6.17 secs (0.10 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3656359612941742 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2881253957748413 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.259075 | time=43.7s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.14423643052577972 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10766056180000305 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.259036 | time=54.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.3710, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3437, 2.0897, 0.2590 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Low Diligence MF elapsed time: 561.25 secs (9.35 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11036738008260727 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07999762147665024 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10544703900814056 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07647175341844559 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.91 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10483088344335556 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07613364607095718 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=0.090548 | time=691.2s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10476754605770111 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07607046514749527 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10475783795118332 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07606589049100876 | |
INFO:birdwatch.matrix_factorization:Num epochs: 147 | |
INFO:birdwatch.matrix_factorization:epoch 147 0.10475708544254303 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07606537640094757 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1682448834180832 | |
INFO:birdwatch.scorer:MFGroupScorer_6 First MF/stable init elapsed time: 80.14 secs (1.34 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_6 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 37.71 secs (0.63 mins) | |
INFO:birdwatch.scorer:MFExpansionScorer Compute scored notes elapsed time: 245.19 secs (4.09 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.53 secs (0.58 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 599301 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 582446 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 469203 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.76 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Compute scored notes elapsed time: 48.98 secs (0.82 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 5574635 post-tombstones and 1196 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4475905, including 4475836 post-tombstones and 69 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 343877 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Compute valid ratings elapsed time: 6.27 secs (0.10 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Helpfulness scores pre-harassment elapsed time: 0.64 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 31721 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 96153 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 26201 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 24774 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4770628 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3886196 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filtering by helpfulness score elapsed time: 6.27 secs (0.10 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2521193 | |
1 140651 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1224352 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2433805, Num Unique Notes Rated: 127698, Num Unique Raters: 23472 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2320223 | |
1 113582 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04666848823139076 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 20.427735028437606 with BCEWithLogitsLoss | |
INFO:birdwatch.note_ratings:Total ratings: 120703210 post-tombstones and 241978 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 95191946, including 95124470 post-tombstones and 67476 pre-tombstones. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 23472, Notes: 127698 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.059069053548214 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 103.68971540558964 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.356210231781006 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4606125354766846 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6742969751358032 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.34321776032447815 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4409317076206207 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2760968804359436 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4060561954975128 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2636943459510803 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 438164 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 102895565 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 79349995 | |
INFO:birdwatch.scorer:MFCoreScorer Filtering by helpfulness score elapsed time: 162.25 secs (2.70 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4013458490371704 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26154324412345886 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.40070536732673645 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2612432539463043 | |
INFO:birdwatch.constants:MFGroupScorer_13: Compute tag thresholds for percentiles elapsed time: 65.29 secs (1.09 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4006197452545166 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26121625304222107 | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 49732971 | |
1 3372716 | |
dtype: int64 | |
INFO:birdwatch.matrix_factorization:Num epochs: 126 | |
INFO:birdwatch.matrix_factorization:epoch 126 0.4006158709526062 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26122498512268066 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.24334657192230225 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Harassment tag consensus elapsed time: 37.71 secs (0.63 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 26175143 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Helpfulness scores post-harassment elapsed time: 1.23 secs (0.02 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 31721 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 96153 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 23491 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 22064 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4770628 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3031814 | |
INFO:birdwatch.note_ratings:Total valid ratings: 7181568 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Compute valid ratings elapsed time: 187.39 secs (3.12 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 22064, Notes: 212402 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.27394280656491 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 137.4099891225526 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38587597012519836 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3186874985694885 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=0.090548 | time=816.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=0.090548 | time=816.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.007627 | time=0.9s | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Helpfulness scores pre-harassment elapsed time: 7.35 secs (0.12 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1026848629117012 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07053884118795395 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.0991692915558815 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0704842284321785 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09788583219051361 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06807220727205276 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09779573976993561 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0678388699889183 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 52447380, Num Unique Notes Rated: 1020212, Num Unique Raters: 421840 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 49134759 | |
1 3312621 | |
dtype: int64 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09777955710887909 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0676669180393219 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09777899086475372 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06779856234788895 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17027948796749115 | |
INFO:birdwatch.constants:Final round MF elapsed time: 42.47 secs (0.71 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_6 prescoring, about to call diligence with 3031814 final round ratings. | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06316084807286847 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.832592983018582 with BCEWithLogitsLoss | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0002188E5ED3028646C97CBE9ADCD12CB5B8BFAF8819BD... -0.173736 | |
1 0002EEF8B312A7DCBF698391778CD9D0F7ADA652FBFB9E... -0.284836 | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... -0.507332 | |
3 000677AE7F63255B464AD153D315B2E25DB8BF771A379D... 0.479094 | |
4 000760B0C9739248AF3CA6B833A219CC24A4B85C5B4D0D... 0.208708 | |
... ... ... | |
22059 FFFAA9B8DDDDF9C3CD12F97B13C1658E63F495884418D6... 0.013979 | |
22060 FFFBB8B4BE340D5AAC99E9168F2711EBAB3CE5C9A2567B... -0.123964 | |
22061 FFFC8248F057883916F06F78A0DB7878BFB2C6162434E2... -0.551289 | |
22062 FFFD65E501817C7A5590FADEE2646D40BF1BA5582F6801... -0.332513 | |
22063 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... -0.005224 | |
[22064 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 22064, vs. num we are initializing: 22064 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 22064 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.466070 | time=0.0s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 120701007 post-tombstones and 241977 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 95190276, including 95122800 post-tombstones and 67476 pre-tombstones. | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.406949 | time=7.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.023018 | time=15.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.962512 | time=22.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.944169 | time=29.1s | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_5 run_scorer_parallelizable: Loading data elapsed time: 25.15 secs (0.42 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_5 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_5. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.935391 | time=35.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.930055 | time=42.8s | |
INFO:birdwatch.note_ratings:Total valid ratings: 7193682 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.926494 | time=49.8s | |
INFO:birdwatch.scorer:MFExpansionScorer Compute valid ratings elapsed time: 188.52 secs (3.14 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.007072 | time=84.6s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 421840, Notes: 1020212 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 51.40831513450146 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 124.3300303432581 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.924022 | time=56.8s | |
INFO:birdwatch.matrix_factorization:epoch 0 3.225405693054199 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4507755041122437 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFExpansionScorer Helpfulness scores pre-harassment elapsed time: 8.65 secs (0.14 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.922227 | time=63.9s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.920981 | time=71.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2720, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.920947 | time=0.0s | |
INFO:birdwatch.scorer: Ratings after group filter: 565869 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.758078 | time=6.9s | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filter input elapsed time: 49.08 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_5: 9303f55444e67e6fd23626d4b2885862c1fa54b649feb313806c66b929d36b0a | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 242685, Num Unique Notes Rated: 22638, Num Unique Raters: 3902 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Prepare ratings elapsed time: 0.17 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_5: c176786745703fe882d386c5d9f08dceb79f55a961fa16bbd6f82f9e892c3323 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_5: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_5: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3902, Notes: 22638 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.720249138616486 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 62.19502819067145 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.585812568664551 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.091764450073242 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.28472474217414856 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20711880922317505 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13348007202148438 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09200548380613327 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10149544477462769 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.065911203622818 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09650790691375732 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06248011067509651 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0958905965089798 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061985645443201065 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09580732882022858 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06190083548426628 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09579630196094513 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061905086040496826 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.09579534083604813 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061904676258563995 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18610632419586182 | |
INFO:birdwatch.scorer:MFGroupScorer_5 First MF/stable init elapsed time: 4.12 secs (0.07 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_5 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.746932 | time=13.8s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.746044 | time=20.7s | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.746017 | time=23.0s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.447986 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.380310 | time=4.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.379469 | time=8.2s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 760595 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722512 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 616331 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.379433 | time=10.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.3191, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.9210, 1.7460, 0.3794 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Low Diligence MF elapsed time: 107.97 secs (1.80 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.56 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.39 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Compute scored notes elapsed time: 41.87 secs (0.70 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 565731 post-tombstones and 138 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 449626, including 449626 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 27112 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Compute valid ratings elapsed time: 0.95 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Helpfulness scores pre-harassment elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3902 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 17268 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3700 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 3232 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 242685 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 215077 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filtering by helpfulness score elapsed time: 0.28 secs (0.00 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 142446 | |
1 11797 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 60834 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 119926, Num Unique Notes Rated: 11584, Num Unique Raters: 2761 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 113615 | |
1 6311 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05262411820622717 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 18.002693709396294 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2761, Notes: 11584 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.352727900552486 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 43.43571169865991 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.3174281120300293 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4138611555099487 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.635836124420166 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.292518675327301 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.42002376914024353 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.238921657204628 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7049931287765503 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.42920413613319397 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.38807183504104614 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2273017019033432 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3837983012199402 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2260620892047882 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3833158612251282 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22575566172599792 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.3833158612251282 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22575566172599792 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.27825331687927246 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Harassment tag consensus elapsed time: 1.79 secs (0.03 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Helpfulness scores post-harassment elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3902 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 17268 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3510 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 3042 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 242685 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 187316 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3042, Notes: 22627 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.278428426216466 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 61.576594345825114 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38187411427497864 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3069080412387848 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09649358689785004 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05959121882915497 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09225410223007202 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05889792740345001 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09085475653409958 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05636866018176079 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09072448313236237 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05619700625538826 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09070117771625519 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05592891573905945 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09070120751857758 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05606788769364357 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18550153076648712 | |
INFO:birdwatch.constants:Final round MF elapsed time: 2.47 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_5 prescoring, about to call diligence with 187316 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.006998 | time=167.8s | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 001670E335A2559879EA4C5497E9469BD163D949F32CFB... 0.706790 | |
1 003DAEE4C05D42B92583AD7BB4E5FC40051E7EDB8A34F4... -0.543804 | |
2 0078B6E44FB3B19530E03D5FF363823AE29AEF431E16A4... -0.303043 | |
3 007931FC488902DD0A8CB7AA24BFAB189E614C73CCAB9E... -0.446241 | |
4 009C72D58070EFB66CCE4A16846DF830BFF7C4D3D6352B... -0.458086 | |
... ... ... | |
3037 FFAC3C1B41112324A7D9677419DF2C179D47327EFC3458... -0.262326 | |
3038 FFB5DC98D9D19D482617D7D9F61B91DFB74F2B5588EADC... 0.322892 | |
3039 FFBF66FB8FE4AEF510F7CD3F18B24F5FCCD83CFBFB4F0E... -0.670645 | |
3040 FFC5FEB6111C3D7EEE8617D8CDE530946BE44871355D9D... 0.007671 | |
3041 FFD53734E61F61EBA6B7498BD376679CF43F368AAD28C7... -0.584890 | |
[3042 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 3042, vs. num we are initializing: 3042 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 3042 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.192408 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.245615 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.806309 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.715728 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.681431 | time=1.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.661712 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.649036 | time=2.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.640733 | time=3.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.635088 | time=3.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.631102 | time=4.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.628285 | time=4.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.1594, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.628208 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.548235 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.538268 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.537306 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.537274 | time=1.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.552759 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.490480 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.489672 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.489639 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6222, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6283, 1.5373, 0.4896 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Low Diligence MF elapsed time: 7.27 secs (0.12 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.87 secs (0.60 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 563491 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 119170712 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 91836703 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filtering by helpfulness score elapsed time: 192.15 secs (3.20 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.constants:MFGroupScorer_6: Compute tag thresholds for percentiles elapsed time: 8.04 secs (0.13 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 58082513 | |
1 3969624 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 29714177 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.69 secs (0.58 mins) | |
INFO:birdwatch.constants:MFGroupScorer_5: Compute tag thresholds for percentiles elapsed time: 0.56 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=080 | loss=0.006995 | time=223.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6820, requires_grad=True) | |
INFO:birdwatch.helpfulness_model:Helpfulness reputation loss: 0.0150, 0.0905, 0.0070 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/reputation_scorer.py, in _prescore_notes_and_users, at line 135: noteStats = noteStats.merge(noteStatusHistory[[c.noteIdKey]].drop_duplicates(), how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.reputation_scorer:Reputation prescoring: returning these columns: | |
noteStats: Index(['noteId', 'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteInterceptRound2'], | |
dtype='object') | |
raterStats: Index(['raterParticipantId', 'internalRaterReputation', | |
'internalRaterIntercept', 'internalRaterFactor1', | |
'lowDiligenceRaterInterceptRound2'], | |
dtype='object') | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 760575 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722844 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 616660 | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_4 run_scorer_parallelizable: Loading data elapsed time: 27.09 secs (0.45 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_4 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.48327550292015076 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3420565128326416 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_4. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_3 run_scorer_parallelizable: Loading data elapsed time: 24.00 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_3 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_3. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_2 run_scorer_parallelizable: Loading data elapsed time: 25.74 secs (0.43 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_2 set to: 4 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 61343877, Num Unique Notes Rated: 1117500, Num Unique Raters: 540285 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_2. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 57434887 | |
1 3908990 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06372257821265519 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.693024796686615 with BCEWithLogitsLoss | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1940497 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filter input elapsed time: 49.06 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_4: c16493959faeadf885156bdc217f054675df34ad11a196bf9dbfbf041e7b0d9d | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1528388, Num Unique Notes Rated: 61169, Num Unique Raters: 14590 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Prepare ratings elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_4: 139ddfaad6f909d2aaf8ab1ac13e3e672bcadd2449b28f69aa92f19d394951dd | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_4: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_4: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 563746 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 119168555 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 91883196 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 14590, Notes: 61169 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 24.986316598276904 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 104.75586017820424 | |
INFO:birdwatch.scorer:MFExpansionScorer Filtering by helpfulness score elapsed time: 195.38 secs (3.26 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.matrix_factorization:epoch 0 6.678780555725098 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.203587532043457 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.33320388197898865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27523019909858704 | |
INFO:birdwatch.scorer: Ratings after group filter: 6272990 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filter input elapsed time: 51.71 secs (0.86 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.17947255074977875 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.14979363977909088 | |
INFO:birdwatch.scorer: Ratings after group filter: 1508019 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10804477334022522 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07586950063705444 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Filter input elapsed time: 47.23 secs (0.79 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_3: e98c320e953427f3f662eed78e6fa61b568d985b49a5cb8826d5a7415f206fa5 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09873808920383453 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06912253797054291 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_2: 2c0f5c6b8075f55d58fcbd5c275aa476998ccb546834d7c8ef1483a1f8591218 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 904592, Num Unique Notes Rated: 69485, Num Unique Raters: 9848 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Prepare ratings elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 58111011 | |
1 3975975 | |
dtype: int64 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_2: ed1ed36c8b0fa771c0980983fb80f3507a6ae8f6560fa3b825ff693fbafa85e3 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_2: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_2: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5544045, Num Unique Notes Rated: 171473, Num Unique Raters: 51204 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Prepare ratings elapsed time: 3.13 secs (0.05 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 9848, Notes: 69485 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09727585315704346 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06815438717603683 | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 13.018521983161833 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 91.85540211210397 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.530896186828613 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.049686431884766 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 29726079 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3711766004562378 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2780326306819916 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09709788113832474 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0680956169962883 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.14695483446121216 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10751470923423767 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.44822174310684204 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3275172710418701 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11536134779453278 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0840974748134613 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_3: 10e1e92cb2663396cae543faabfa6082e56b1f2fbc3f84182d932d4c87d16ff2 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.0970749631524086 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06805853545665741 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_3: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11034777760505676 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08022522181272507 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_3: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:Num epochs: 153 | |
INFO:birdwatch.matrix_factorization:epoch 153 0.09707260876893997 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06805459409952164 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16788910329341888 | |
INFO:birdwatch.scorer:MFGroupScorer_4 First MF/stable init elapsed time: 26.38 secs (0.44 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_4 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10969796776771545 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07964546978473663 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10959462821483612 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07957520335912704 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 51204, Notes: 171473 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 32.33188315361602 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 108.27367002577924 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.160154342651367 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.7076520919799805 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10956564545631409 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07954714447259903 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.matrix_factorization:Num epochs: 156 | |
INFO:birdwatch.matrix_factorization:epoch 156 0.10956329107284546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07954873144626617 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17118540406227112 | |
INFO:birdwatch.scorer:MFGroupScorer_2 First MF/stable init elapsed time: 15.08 secs (0.25 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_2 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.19 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 540285, Notes: 1117500 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 54.89384966442953 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 113.53984841333741 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.34891653060913086 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2760895788669586 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.227686882019043 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.470324158668518 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1404491811990738 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10607077926397324 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 61378965, Num Unique Notes Rated: 1117642, Num Unique Raters: 540525 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 57463556 | |
1 3915409 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11468944698572159 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08667495846748352 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06379073026076605 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.676258853161954 with BCEWithLogitsLoss | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.77 secs (0.60 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Compute scored notes elapsed time: 44.85 secs (0.75 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1940185 post-tombstones and 312 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1600817, including 1600817 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 110728 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Compute valid ratings elapsed time: 2.15 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Helpfulness scores pre-harassment elapsed time: 0.30 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.87 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Compute scored notes elapsed time: 42.42 secs (0.71 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 14590 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 39083 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 12156 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 11347 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1528388 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 1269605 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filtering by helpfulness score elapsed time: 1.91 secs (0.03 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.note_ratings:Total ratings: 1507617 post-tombstones and 402 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1192400, including 1192400 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 775946 | |
1 81413 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 412246 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 791283, Num Unique Notes Rated: 38139, Num Unique Raters: 10667 | |
INFO:birdwatch.note_ratings:Total valid ratings: 82864 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Compute valid ratings elapsed time: 1.56 secs (0.03 mins) | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 720521 | |
1 70762 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.08942691805586624 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 10.18231536700489 with BCEWithLogitsLoss | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Helpfulness scores pre-harassment elapsed time: 0.22 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 10667, Notes: 38139 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 20.747345237158815 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 74.1804631105278 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4033613204956055 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5108660459518433 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 9848 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 42685 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 9127 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7946 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 904592 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 782180 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Filtering by helpfulness score elapsed time: 1.14 secs (0.02 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 495140 | |
1 25622 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 261418 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 420491, Num Unique Notes Rated: 34040, Num Unique Raters: 7004 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 405316 | |
1 15175 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.03608876289861138 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 26.709456342668865 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 7004, Notes: 34040 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 12.352849588719154 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 60.03583666476299 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.1174519062042236 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.2097327709197998 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6702408194541931 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.35780924558639526 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6042913198471069 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2741597890853882 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.37693291902542114 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2036038637161255 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4510749876499176 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2899180054664612 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11040268838405609 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08275434374809265 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.33923542499542236 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.19063740968704224 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3333747982978821 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.188731387257576 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4177025556564331 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2781203091144562 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.332400918006897 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1884804368019104 | |
INFO:birdwatch.matrix_factorization:Num epochs: 102 | |
INFO:birdwatch.matrix_factorization:epoch 102 0.3324318528175354 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.18854758143424988 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.28552383184432983 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Harassment tag consensus elapsed time: 5.38 secs (0.09 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Helpfulness scores post-harassment elapsed time: 0.32 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.41320937871932983 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.275778204202652 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 9848 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 42685 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 8606 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7425 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 904592 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 663894 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 7425, Notes: 69416 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 9.56399101071799 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 89.41333333333333 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3908115029335022 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3216673731803894 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.41262438893318176 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2755293548107147 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10757246613502502 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07446270436048508 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4125490188598633 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2755182087421417 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10324550420045853 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07333981245756149 | |
INFO:birdwatch.matrix_factorization:Num epochs: 132 | |
INFO:birdwatch.matrix_factorization:epoch 132 0.4125423729419708 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2755069434642792 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2101941704750061 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Harassment tag consensus elapsed time: 12.91 secs (0.22 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Helpfulness scores post-harassment elapsed time: 0.43 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10175684094429016 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07066280394792557 | |
INFO:birdwatch.matrix_factorization:Num epochs: 73 | |
INFO:birdwatch.matrix_factorization:epoch 73 0.10167042911052704 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07047589123249054 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17347291111946106 | |
INFO:birdwatch.constants:Final round MF elapsed time: 6.61 secs (0.11 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_2 prescoring, about to call diligence with 663894 final round ratings. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 14590 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 39083 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 10836 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00007F6B0991C1CA1DF283A7615A79999117CAC8C962A5... 0.070908 | |
1 0000FDC49B38F4C994CAA60961F88FB421B03D0D43F499... -0.285049 | |
2 001BE45AE64F526CFC3CC1B706DE3D812A6063976CA65D... -0.403973 | |
3 0020A81474D2B3E0479ED2BB0A5577F54852D9381A5DD3... -0.163264 | |
4 00378C3EEC142CC75F89B1EBCE084827285C05474A43F0... 0.015326 | |
... ... ... | |
7420 FFE33E8172BAD7A1575F60FCAB8012D6BE7798D2C8A26D... -0.322799 | |
7421 FFEC26DAD31FB175031B1A676DACDDFE983F60DAFA8985... -0.639325 | |
7422 FFF8F9C2C8D0118227B1D6295B8CF7BA535B2A44B2EDEF... -0.539850 | |
7423 FFFF33553CB8A72FF1CB6FB663CED93F292F0D2C161852... -0.467893 | |
7424 FFFF82FC0D34E74125C0E5C894E335531C58342FB7C039... -0.882820 | |
[7425 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 7425, vs. num we are initializing: 7425 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 7425 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.804531 | time=0.0s | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 10027 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1528388 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 983865 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 10027, Notes: 61107 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 16.100692228386272 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 98.1215717562581 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3892819285392761 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3207399547100067 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.445824 | time=1.5s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10988889634609222 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08249671757221222 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.44207409024238586 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3240014910697937 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.996797 | time=3.1s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09804782271385193 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06480919569730759 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.909093 | time=4.6s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09472730755805969 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06550641357898712 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.877876 | time=6.2s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09416499733924866 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06453821808099747 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.861392 | time=7.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.851243 | time=9.3s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.0940105989575386 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06372792273759842 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.844467 | time=10.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.839929 | time=12.4s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09400354325771332 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06372714787721634 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.09400354325771332 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06372714787721634 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17005860805511475 | |
INFO:birdwatch.constants:Final round MF elapsed time: 14.03 secs (0.23 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_4 prescoring, about to call diligence with 983865 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0011AB5425173F62E5D4A1787E34ED324BDD5807D4C3B8... -0.560774 | |
1 001C8D32D1F35CAC07983265BA3F769C6976F5A71141E4... 0.412092 | |
2 0026D52237BA91FDF564C99A30B594C53E0E5E7CF76F5C... 0.575215 | |
3 003B5BBD63338E6ECB7DA6F16AC010576B506676849D76... 0.310459 | |
4 003CE80F068D189A05BBA9748FCA578819680378FBDEB7... -0.453448 | |
... ... ... | |
10022 FFD3B8B9E935D1D393558464F9172AF81C6CF5E76C31EA... 0.378896 | |
10023 FFDCC6136CBDCE1394D680A912CB4203DE5D035006979B... 0.509281 | |
10024 FFEC392A6B742286C786DE71BB4102B6804FF360A00B3A... 0.174046 | |
10025 FFF10C79740909DEDBBF234382D89BF3F3D4750C5E983B... -0.228667 | |
10026 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... -0.307492 | |
[10027 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 10027, vs. num we are initializing: 10027 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 10027 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=19.271133 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.836694 | time=13.9s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10983089357614517 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08246933668851852 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.834486 | time=15.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.0472, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.834427 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.629673 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.690030 | time=1.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.151716 | time=4.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.680003 | time=3.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.679054 | time=4.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.078675 | time=7.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=110 | loss=1.678900 | time=5.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.499240 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.428317 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.427418 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.058941 | time=9.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.427381 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.0585, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.8345, 1.6789, 0.4274 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Low Diligence MF elapsed time: 24.02 secs (0.40 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.050554 | time=11.8s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1098230704665184 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.082444928586483 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.045908 | time=14.3s | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 147 | |
INFO:birdwatch.matrix_factorization:epoch 147 0.10982239246368408 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08244722336530685 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16845667362213135 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.042941 | time=16.7s | |
INFO:birdwatch.scorer:MFGroupScorer_3 First MF/stable init elapsed time: 95.98 secs (1.60 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_3 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.040934 | time=19.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.039437 | time=21.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.038357 | time=23.7s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7240, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.038324 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.909907 | time=2.3s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 540525, Notes: 1117642 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7069830298423767 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.4395275115966797 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.899021 | time=4.5s | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 54.91826989322162 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 113.5543499375607 | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.898278 | time=7.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.898255 | time=7.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.454790 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.381626 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.380737 | time=2.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.380699 | time=3.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.7993, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0384, 1.8983, 0.3807 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Low Diligence MF elapsed time: 36.15 secs (0.60 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 3.2270126342773438 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4697400331497192 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.10 secs (0.57 mins) | |
INFO:birdwatch.constants:MFGroupScorer_2: Compute tag thresholds for percentiles elapsed time: 1.63 secs (0.03 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_1 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 100 0.44119179248809814 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32353708148002625 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.33 secs (0.59 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Compute scored notes elapsed time: 49.03 secs (0.82 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 6271825 post-tombstones and 1165 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4659256, including 4659244 post-tombstones and 12 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 490631 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Compute valid ratings elapsed time: 7.46 secs (0.12 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Helpfulness scores pre-harassment elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 51204 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 102321 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 44063 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 36.78 secs (0.61 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_1 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_1 run_scorer_parallelizable: Loading data elapsed time: 25.12 secs (0.42 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_1 set to: 4 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 40373 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5544045 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4520142 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filtering by helpfulness score elapsed time: 7.47 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 3026563 | |
1 148763 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1344816 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2998892, Num Unique Notes Rated: 108472, Num Unique Raters: 38303 | |
INFO:birdwatch.constants:MFGroupScorer_4: Compute tag thresholds for percentiles elapsed time: 2.21 secs (0.04 mins) | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2869664 | |
1 129228 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.043091915280710345 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 22.206209180672918 with BCEWithLogitsLoss | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 38303, Notes: 108472 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 27.646692233940556 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 78.29391953632874 | |
INFO:birdwatch.run_scoring:MFGroupScorer_14 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 0 3.463493585586548 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5612633228302002 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_1. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6911249160766602 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3789512813091278 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.45819544792175293 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29484790563583374 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4250757694244385 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28319868445396423 | |
INFO:birdwatch.run_scoring:MFGroupScorer_14 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_14 run_scorer_parallelizable: Loading data elapsed time: 24.93 secs (0.42 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_14 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.48641249537467957 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.34751391410827637 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4205896258354187 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2810693085193634 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_14. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7068841457366943 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.4395037293434143 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4199836552143097 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2807298004627228 | |
INFO:birdwatch.matrix_factorization:Num epochs: 104 | |
INFO:birdwatch.matrix_factorization:epoch 104 0.4199780225753784 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2807644009590149 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.24957284331321716 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Harassment tag consensus elapsed time: 42.80 secs (0.71 mins) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Helpfulness scores post-harassment elapsed time: 1.12 secs (0.02 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 51204 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 102321 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 38319 | |
INFO:birdwatch.scorer: Ratings after group filter: 6106169 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 34629 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5544045 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3271398 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Filter input elapsed time: 52.21 secs (0.87 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 34629, Notes: 170629 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.172579104372645 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 94.46989517456467 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37313419580459595 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3077939450740814 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.44105207920074463 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32347816228866577 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_1: 40984a6198d81aae57f48c13da68514b5f8cd105236ca3356b05f29c643d788d | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5427307, Num Unique Notes Rated: 146459, Num Unique Raters: 56037 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Prepare ratings elapsed time: 2.95 secs (0.05 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1064104437828064 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07622941583395004 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_1: f1a506ab855814d809e5a88ab0a1da18454e1be6218d52e0f4181b0724376828 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_1: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_1: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10472710430622101 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07755328714847565 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 56037, Notes: 146459 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 37.05683501867417 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 96.85220479326159 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.283523082733154 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.8318586349487305 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10342289507389069 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07481147348880768 | |
INFO:birdwatch.scorer: Ratings after group filter: 11425870 | |
INFO:birdwatch.matrix_factorization:Num epochs: 128 | |
INFO:birdwatch.matrix_factorization:epoch 128 0.4410378336906433 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32347553968429565 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20342975854873657 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Filter input elapsed time: 52.69 secs (0.88 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scorer:MFCoreScorer Harassment tag consensus elapsed time: 584.45 secs (9.74 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3523634970188141 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29398736357688904 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10336979478597641 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07482481002807617 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1033562421798706 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07459260523319244 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.10335546731948853 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07474108040332794 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1712927222251892 | |
INFO:birdwatch.constants:Final round MF elapsed time: 48.54 secs (0.81 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_3 prescoring, about to call diligence with 3271398 final round ratings. | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_14: 8750fbc0c4a8c6856c0715a3c2a3bee99a099cc1d1585c5b7578c6bc4577c403 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.093447 | |
1 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.269374 | |
2 0006F1E9A72BC327122346B1EC672566F8DE4304BC7813... -0.141926 | |
3 0008CE6A2932D0D88C4965BDA83BD8CE906EC91A951066... -0.599729 | |
4 00099B57E40688AFECCE8A3415A2AC45FD8944C33ACB9C... -0.491374 | |
... ... ... | |
34624 FFFBB4B078CA1D3C3E23B986FA1A0BD4B3081E70C2B274... -0.753161 | |
34625 FFFC156EAADE44C6CB99B0EB02DB63AAA7DC330AFC0E4B... -0.631799 | |
34626 FFFC37B8B75A047FC218F52FF5F03C876A906BD09B0F34... 0.293752 | |
34627 FFFD98FC04D3E1615C8BF2617DA7EA6BAEDCED7C9BFDC0... -0.252223 | |
34628 FFFECB9745EFB9D109358D450779F68A96A14C9AC03AD4... -0.541202 | |
[34629 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 34629, vs. num we are initializing: 34629 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 34629 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=14.805888 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13850364089012146 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10664162039756775 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 10521432, Num Unique Notes Rated: 399883, Num Unique Raters: 60009 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Prepare ratings elapsed time: 5.52 secs (0.09 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.496497 | time=7.5s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11227183789014816 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08500148355960846 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFCoreScorer Helpfulness scores post-harassment elapsed time: 25.70 secs (0.43 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.157873 | time=15.0s | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_14: 713b9493c620526b010044f56ad866b782e1ee54f793c296a29914910260195b | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_14: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_14: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.109315 | time=22.5s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4547945559024811 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33430689573287964 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10882274806499481 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08255541324615479 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 60009, Notes: 399883 | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 26.31127604824411 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 175.33090036494525 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.5226768851280212 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.5221538543701172 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.48637136816978455 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3475351333618164 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.096207 | time=30.0s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10838336497545242 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08214987069368362 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.090390 | time=37.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.086958 | time=44.9s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1959424614906311 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1440804898738861 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10832209885120392 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08201057463884354 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.084658 | time=52.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.082972 | time=59.9s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10831545293331146 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08201276510953903 | |
INFO:birdwatch.matrix_factorization:Num epochs: 144 | |
INFO:birdwatch.matrix_factorization:epoch 144 0.10831514745950699 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08201918751001358 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16088883578777313 | |
INFO:birdwatch.scorer:MFGroupScorer_1 First MF/stable init elapsed time: 92.06 secs (1.53 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.081795 | time=67.5s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13994905352592468 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10389752686023712 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.90 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.080969 | time=75.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.5107, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.080947 | time=0.0s | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.918434 | time=7.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.907470 | time=14.4s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.12313978374004364 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09042132645845413 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.906580 | time=21.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=1.906534 | time=25.1s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.438351 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.356483 | time=4.3s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.45029357075691223 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3314984142780304 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.355480 | time=8.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.355439 | time=10.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.6631, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0810, 1.9065, 0.3554 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Low Diligence MF elapsed time: 114.44 secs (1.91 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.97 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Compute scored notes elapsed time: 48.02 secs (0.80 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.45473241806030273 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3342928886413574 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11913703382015228 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08904113620519638 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 6104849 post-tombstones and 1320 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4684275, including 4684274 post-tombstones and 1 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 403022 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Compute valid ratings elapsed time: 7.25 secs (0.12 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Helpfulness scores pre-harassment elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 56037 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 97085 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 45259 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 40803 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5427307 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4322552 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Filtering by helpfulness score elapsed time: 7.11 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2776055 | |
1 172083 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1374414 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 599301 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 582446 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 413599 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2798248, Num Unique Notes Rated: 93980, Num Unique Raters: 38669 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2644212 | |
1 154036 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05504730102549881 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 17.166194915474303 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 38669, Notes: 93980 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 29.774930836348158 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.36411595851975 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4313859939575195 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.536501407623291 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11795955896377563 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08898502588272095 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.692321240901947 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.37830930948257446 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.46363604068756104 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3071403205394745 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4301488399505615 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2954534888267517 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.42568063735961914 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2930283844470978 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.33 secs (0.57 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11748424917459488 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08904341608285904 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 382560 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 102895565 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 53270228 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4250508248806 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2928185760974884 | |
INFO:birdwatch.constants:MFGroupScorer_3: Compute tag thresholds for percentiles elapsed time: 8.10 secs (0.14 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4249611496925354 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2927919626235962 | |
INFO:birdwatch.matrix_factorization:Num epochs: 130 | |
INFO:birdwatch.matrix_factorization:epoch 130 0.42495229840278625 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29278215765953064 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.23114803433418274 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Harassment tag consensus elapsed time: 46.58 secs (0.78 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Helpfulness scores post-harassment elapsed time: 1.08 secs (0.02 mins) | |
INFO:birdwatch.run_scoring:MFTopicScorer_Unassigned run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 56037 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 97085 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 41587 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.11721114069223404 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08911249041557312 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 37131 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5427307 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3354212 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 37131, Notes: 146091 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 22.95974426898303 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 90.33454525867873 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.39958667755126953 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33497971296310425 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11128383129835129 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08071509003639221 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4496009349822998 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33117789030075073 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4502350389957428 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33148500323295593 | |
INFO:birdwatch.run_scoring:MFTopicScorer_Unassigned run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_Unassigned run_scorer_parallelizable: Loading data elapsed time: 23.74 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_Unassigned set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10741733014583588 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08088459819555283 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.1170102059841156 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08917636424303055 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_Unassigned. Original rating length: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10678169876337051 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07980678230524063 | |
INFO:birdwatch.scorer: Ratings after topic filter: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 0 | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scorer.py, in prescore, at line 286: pd.DataFrame(columns=self.get_internal_scored_notes_cols()), | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
INFO:birdwatch.scorer:MFTopicScorer_Unassigned Filter input elapsed time: 11.44 secs (0.19 mins) | |
INFO:birdwatch.run_scoring:MFTopicScorer_UkraineConflict run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 80 0.1066134124994278 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07897433638572693 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382560, Notes: 1225337 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 43.473940638371324 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 139.24672731074864 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10660293698310852 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07898107916116714 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10660293698310852 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07898107916116714 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1644572913646698 | |
INFO:birdwatch.constants:Final round MF elapsed time: 49.46 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_1 prescoring, about to call diligence with 3354212 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37755998969078064 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3174588978290558 | |
INFO:birdwatch.matrix_factorization:epoch 180 0.1168489009141922 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08924125880002975 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0000D09E403B665ADB698D8DF843CB22F352EF89ABF7CB... -0.543661 | |
1 00037C8B3793B9E44C7885F55752A06806091F68598CC0... -0.515672 | |
2 00039991A9322D52F83399BC5B951F43B2A73869C21F10... -0.601601 | |
3 000402D0CF8FEC70E5C4BA76322215AE1A965BBE8A7568... 0.268096 | |
4 0004DC6827440EF91C141691934452677C533B6CA90AC4... -0.288777 | |
... ... ... | |
37126 FFF1B7F5E3903007BC3D5724DA6C406F78DEE26BE8456C... 0.473115 | |
37127 FFF48D8AD66904B961AF600709250FD2CB54004147EB44... -0.198971 | |
37128 FFF8367EF46CACBB9D7C020C910B12A206DAC9BA5E05A9... -0.438402 | |
37129 FFF9D85CEB466E2694589895B9D234CD48219AC8D3ADC4... -0.194319 | |
37130 FFFDEAD3B6BBA58927423C9C907473FD24FFEEACB4396E... 0.091560 | |
[37131 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 37131, vs. num we are initializing: 37131 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 37131 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.543821 | time=0.1s | |
INFO:birdwatch.run_scoring:MFTopicScorer_UkraineConflict run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_UkraineConflict run_scorer_parallelizable: Loading data elapsed time: 24.11 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_UkraineConflict set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.750908 | time=8.2s | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_UkraineConflict. Original rating length: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.313851 | time=15.9s | |
INFO:birdwatch.matrix_factorization:epoch 200 0.11671170592308044 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08930625766515732 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.252604 | time=23.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.237291 | time=31.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.231471 | time=39.0s | |
INFO:birdwatch.matrix_factorization:epoch 220 0.11659480631351471 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08937092870473862 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.228590 | time=46.6s | |
INFO:birdwatch.scorer: Ratings after topic filter: 4141060 | |
INFO:birdwatch.scorer: Ratings after group filter: 4141060 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Filter input elapsed time: 42.09 secs (0.70 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4494892358779907 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3311305642127991 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_UkraineConflict: 3ebbbc69ceb15ad170a310e32ddccee8e28dd5fa596a775769dbd80fe23e2fab | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.226951 | time=54.3s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4495398998260498 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.331163227558136 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 3246773, Num Unique Notes Rated: 36447, Num Unique Raters: 78578 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Prepare ratings elapsed time: 2.43 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_UkraineConflict: 8c50a83542a2436dbc3a9b7e5f8d96cfff3cdfb4a8b3da0eee95159465cc8168 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_UkraineConflict: 2b4d08da3b75cca6e752f688fedada6e794b7f40adf61c96bb398a7a5c1e1937 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_UkraineConflict: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 78578, Notes: 36447 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 89.0820369303372 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 41.31910967446359 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.647773742675781 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.182163238525391 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.225983 | time=62.0s | |
INFO:birdwatch.matrix_factorization:epoch 240 0.11649289727210999 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08943793922662735 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.32890453934669495 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27723658084869385 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11145132780075073 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0823032334446907 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.225348 | time=69.7s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.12462562322616577 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08701226860284805 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.224914 | time=77.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0254, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.224902 | time=0.0s | |
INFO:birdwatch.matrix_factorization:Num epochs: 127 | |
INFO:birdwatch.matrix_factorization:epoch 127 0.44947731494903564 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33112481236457825 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2023831307888031 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Harassment tag consensus elapsed time: 682.20 secs (11.37 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09235216677188873 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06311653554439545 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.035035 | time=7.4s | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.08770240843296051 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06003275513648987 | |
INFO:birdwatch.matrix_factorization:epoch 260 0.1163988932967186 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08950584381818771 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.024729 | time=14.8s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0871426984667778 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05961025878787041 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.023927 | time=22.2s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.08706974983215332 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0595758818089962 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.023901 | time=24.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.398614 | time=0.0s | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Helpfulness scores post-harassment elapsed time: 25.16 secs (0.42 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.08705941587686539 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05956045538187027 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.317246 | time=4.4s | |
INFO:birdwatch.matrix_factorization:Num epochs: 144 | |
INFO:birdwatch.matrix_factorization:epoch 144 0.08705910295248032 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05955982208251953 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.15050366520881653 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict First MF/stable init elapsed time: 47.95 secs (0.80 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_UkraineConflict | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_UkraineConflict prescoring, about to call diligence with 3246773 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 056B1936908F42285AC8A4E4CD928C9BC3DAD8547FEE39... -0.729328 | |
1 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... 0.162987 | |
2 67B54620C2319FCDE70894F7B1D89C882952908664A35D... -0.816580 | |
3 E23374E04DD1B97ED5E4BE68F56CD25AE5DE53DD2A3541... -0.217508 | |
4 E462D40CC316ED0864D77A36DA000DA98A8A6F61C204DE... -0.774798 | |
... ... ... | |
78573 7F7389294115E9220A24B85275C74D18FDB99EEB0E14D7... -0.679234 | |
78574 9807DA2C5AE0CAD796716CE294B7C2B934961C61D93F10... -0.070162 | |
78575 B1FFD6BD0C720E70F89339D884134220B070056E256926... -0.554269 | |
78576 59ADB7D6CCD4A96D5D78ADFA69D331F9929C1E1998457C... 0.498540 | |
78577 148D27E8C205D49A0E19D2092330015CC9FE372C3683D2... 0.228918 | |
[78578 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 78578, vs. num we are initializing: 78578 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 78578 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=20.511316 | time=0.1s | |
INFO:birdwatch.matrix_factorization:epoch 280 0.11631342768669128 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0895763412117958 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.316269 | time=8.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.316227 | time=11.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.2295, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.2249, 2.0239, 0.3162 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Low Diligence MF elapsed time: 116.81 secs (1.95 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.776832 | time=8.1s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.89 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.266505 | time=16.2s | |
INFO:birdwatch.matrix_factorization:epoch 300 0.11623996496200562 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08964435756206512 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.198164 | time=24.3s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10985996574163437 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08399942517280579 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.44942978024482727 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33111733198165894 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.183880 | time=32.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.179201 | time=40.5s | |
INFO:birdwatch.matrix_factorization:epoch 320 0.11617498844861984 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08971162140369415 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.177233 | time=48.5s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.44 secs (0.56 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.176270 | time=56.5s | |
INFO:birdwatch.constants:MFGroupScorer_1: Compute tag thresholds for percentiles elapsed time: 8.69 secs (0.14 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.175751 | time=64.6s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFTopicScorer_GazaConflict run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 340 0.11611758172512054 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08977796137332916 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.175466 | time=73.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.175302 | time=82.5s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.3169, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.175298 | time=0.0s | |
INFO:birdwatch.run_scoring:MFTopicScorer_GazaConflict run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_GazaConflict run_scorer_parallelizable: Loading data elapsed time: 24.12 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_GazaConflict set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.833224 | time=7.8s | |
INFO:birdwatch.matrix_factorization:epoch 360 0.11606357246637344 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0898435115814209 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_GazaConflict. Original rating length: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10892382264137268 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08213447779417038 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.826188 | time=15.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.825087 | time=23.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.825087 | time=23.2s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.292382 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.203168 | time=4.7s | |
INFO:birdwatch.matrix_factorization:epoch 380 0.11602069437503815 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0899050161242485 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.202120 | time=9.3s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.44940435886383057 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33110469579696655 | |
INFO:birdwatch.matrix_factorization:Num epochs: 141 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.202074 | time=11.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(3.1794, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.1753, 1.8251, 0.2021 | |
INFO:birdwatch.matrix_factorization:epoch 141 0.44940435886383057 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33110469579696655 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2023589015007019 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Low Diligence MF elapsed time: 121.25 secs (2.02 mins) | |
INFO:birdwatch.scorer:MFExpansionScorer Harassment tag consensus elapsed time: 738.64 secs (12.31 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 0.73 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 760595 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722512 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 549629 | |
INFO:birdwatch.constants:MFTopicScorer_UkraineConflict: Compute tag thresholds for percentiles elapsed time: 7.00 secs (0.12 mins) | |
INFO:birdwatch.scorer: Ratings after topic filter: 12154882 | |
INFO:birdwatch.scorer: Ratings after group filter: 12154882 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Filter input elapsed time: 42.69 secs (0.71 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 400 0.11597800254821777 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08996281027793884 | |
INFO:birdwatch.run_scoring:MFTopicScorer_MessiRonaldo run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:Num epochs: 403 | |
INFO:birdwatch.matrix_factorization:epoch 403 0.11597742885351181 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08996936678886414 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFExpansionScorer Helpfulness scores post-harassment elapsed time: 25.26 secs (0.42 mins) | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_GazaConflict: 0899bc0f0b8e91f6d592ed88698b7285e7b1add26ce9f5b207e820ada6913ffa | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 11007207, Num Unique Notes Rated: 97238, Num Unique Raters: 162717 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Prepare ratings elapsed time: 6.22 secs (0.10 mins) | |
INFO:birdwatch.run_scoring:MFTopicScorer_MessiRonaldo run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_MessiRonaldo run_scorer_parallelizable: Loading data elapsed time: 22.81 secs (0.38 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_MessiRonaldo set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_MessiRonaldo. Original rating length: 120945188 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 60009, Notes: 399883 | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12069502472877502 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.5224084258079529 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.1088177040219307 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08170998096466064 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 496789 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 119170712 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 62806087 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_GazaConflict: 1e7e6706af13d46a8ac6024775a9bfaf6ebbbbf59c2cff623c20b50c6fea550e | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_GazaConflict: efd733efe14125b19a0ebe45a27bba926b5b7b6bf1da864ff3ba9c6d7360dd80 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_GazaConflict: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 162717, Notes: 97238 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 113.1986157674983 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 67.6463246003798 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.694653511047363 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.244139671325684 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.06298001110553741 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1837863177061081 | |
INFO:birdwatch.scorer: Ratings after topic filter: 200420 | |
INFO:birdwatch.scorer: Ratings after group filter: 200420 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Filter input elapsed time: 33.10 secs (0.55 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_MessiRonaldo: e6d16f654b7f915c6719c982b2a372a78809faad60175103e807127208d34cbb | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 61938, Num Unique Notes Rated: 2412, Num Unique Raters: 2779 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Prepare ratings elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_MessiRonaldo: 9df4e190b3da2713c39bb7713fd53cb2449d8a65127a41b7f0a875af8f84a2ea | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_MessiRonaldo: dacd600322f45e8892bca58dfa4907e6e35fcea6a202ac110c602c2e19f75663 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_MessiRonaldo: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2779, Notes: 2412 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 25.67910447761194 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 22.28787333573228 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.993242263793945 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.488073825836182 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3545120656490326 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2964816391468048 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11567525565624237 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07354683429002762 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.08206673711538315 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04683299362659454 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.07688583433628082 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04257965087890625 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.07625867426395416 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.042176634073257446 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.07617507874965668 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.042144738137722015 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.07616481930017471 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04213244840502739 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.07616391777992249 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04213307052850723 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1668098121881485 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo First MF/stable init elapsed time: 1.58 secs (0.03 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_MessiRonaldo | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_MessiRonaldo prescoring, about to call diligence with 61938 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 90D27164DF30535EDB518FAD15DEE8728388F8CA14C75E... -0.769909 | |
1 4E44139DB610989839A095579EBA2EF46825BF25E13FFD... 0.221982 | |
2 2E31629F722BF87A215706A6311E21E123A4624B4D10E2... -0.752639 | |
3 6745B794E9C46A45ABF33E250B5053EC684C28F888355F... -0.286003 | |
4 5C923A1ACF69C684AFABDF63F42734BDCC5FE2B8E3611A... -0.630027 | |
... ... ... | |
2774 045F1138655BBE7C9639E8A42B32AA1D7EE213DBD2832D... -0.572096 | |
2775 9C215E6717E4162DE9313F8214978A746A4A62897430AB... 0.550050 | |
2776 F8EE952D0039E1B181339BB21EB00307992E77E8E52CBF... 0.609762 | |
2777 143AE16F0329577DFF5F747B4565E180A03048654AEBB6... -0.663425 | |
2778 E10D7A9362563027C336080E747085D504D542CC799CFB... 0.529393 | |
[2779 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2779, vs. num we are initializing: 2779 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2779 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.717415 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.410041 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.804307 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.696636 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.668390 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.656887 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.651020 | time=1.5s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3629034757614136 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31634023785591125 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.647699 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.645640 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.644296 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.643394 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.1359, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.643369 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.323995 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.318010 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.316943 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.316943 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.410452 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.306273 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.305091 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.305035 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8759, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6434, 1.3169, 0.3050 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Low Diligence MF elapsed time: 3.82 secs (0.06 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 0.08 secs (0.00 mins) | |
INFO:birdwatch.constants:MFTopicScorer_MessiRonaldo: Compute tag thresholds for percentiles elapsed time: 0.28 secs (0.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFMultiGroupScorer_1 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 40 0.057055845856666565 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1782406121492386 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11413615942001343 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0796656683087349 | |
INFO:birdwatch.run_scoring:MFMultiGroupScorer_1 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFMultiGroupScorer_1 run_scorer_parallelizable: Loading data elapsed time: 23.88 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFMultiGroupScorer_1 set to: 4 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 496789, Notes: 1319220 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.6085012355786 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 126.42406937351673 | |
INFO:birdwatch.scorer:Filtering ratings for MFMultiGroupScorer_1. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3719310164451599 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31204769015312195 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.04865109547972679 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12749989330768585 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10880700498819351 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08162391185760498 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10880700498819351 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08162391185760498 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16981279850006104 | |
INFO:birdwatch.constants:Final round MF elapsed time: 558.46 secs (9.31 mins) | |
INFO:birdwatch.mf_base_scorer:In MFCoreScorer prescoring, about to call diligence with 53270228 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 60 0.08603987842798233 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.058530088514089584 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.04556620121002197 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1157870888710022 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.08239531517028809 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.055343035608530045 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 6257534 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Filter input elapsed time: 49.67 secs (0.83 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 760575 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722844 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 549781 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.08192259818315506 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05499133840203285 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.04448603838682175 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1144147738814354 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFMultiGroupScorer_1: 3f4d9b4462531bf76fcc90ca6df393aebf1250a906a431a895bfaae8edba864d | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5396149, Num Unique Notes Rated: 227259, Num Unique Raters: 55432 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Prepare ratings elapsed time: 2.86 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFMultiGroupScorer_1: 863bf755f5c7b4eb8d22d1649914d6eb22e7068147302217f10a86b6e3bc5555 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFMultiGroupScorer_1: 6f492efd4f4511d61fd17a59c8e01c6947c592b05ce271b1f1bd38d02d012c73 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFMultiGroupScorer_1: 1399dc2994266dc0c33b0e8cbb6bd1846586e58449dac3e640a99ce2ffc5effa | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 55432, Notes: 227259 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 23.74448976718194 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 97.34718213306394 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.220770835876465 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.769021034240723 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.08185769617557526 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05496711656451225 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.04402000457048416 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11398404836654663 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.172086 | |
1 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.691823 | |
2 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.284578 | |
3 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.437021 | |
4 0000AE9A69E1B5D132C053E253DC42A007EDE2F11C39CF... 0.418497 | |
... ... ... | |
382555 FFFFA008A90B7144EF2CC117355D4B4743C471CA9B2DCA... 0.497879 | |
382556 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.019430 | |
382557 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.276673 | |
382558 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.188490 | |
382559 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.558081 | |
[382560 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 382560, vs. num we are initializing: 382560 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 382560 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.795460 | time=0.7s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3171849548816681 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2528494894504547 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11216723173856735 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08331191539764404 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 496867 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 119168555 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 62758781 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.08184971660375595 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.054957516491413116 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15018439292907715 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11736781150102615 | |
INFO:birdwatch.matrix_factorization:Num epochs: 145 | |
INFO:birdwatch.matrix_factorization:epoch 145 0.08184927701950073 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05495864152908325 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1444670557975769 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict First MF/stable init elapsed time: 152.20 secs (2.54 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_GazaConflict | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_GazaConflict prescoring, about to call diligence with 11007207 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 140 0.0437372662127018 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11355447769165039 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1191929280757904 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08953303843736649 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... 0.703149 | |
1 9E93A0C21A1CD3DD7C3A772E71A2DD0B6E79103B020A32... 0.786631 | |
2 EF12C150CE8A147E0804CFBEA80649018A15435E54C4E5... 0.818830 | |
3 EBDCB80B1EC4A9FB51C8A562377D72F9569692DEFFC8BC... 0.645276 | |
4 70B62959F72CA22F3697BD4E5674B3990AD91893FD9320... 0.775102 | |
... ... ... | |
162712 7F7389294115E9220A24B85275C74D18FDB99EEB0E14D7... 0.399047 | |
162713 0BBD746C51DEAC678D11F36311B921D6896E4093DD2D96... -0.404789 | |
162714 9807DA2C5AE0CAD796716CE294B7C2B934961C61D93F10... 0.088557 | |
162715 B1FFD6BD0C720E70F89339D884134220B070056E256926... 0.246175 | |
162716 59ADB7D6CCD4A96D5D78ADFA69D331F9929C1E1998457C... 0.618130 | |
[162717 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 162717, vs. num we are initializing: 162717 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 162717 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=24.941711 | time=0.2s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11423240602016449 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08624144643545151 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.04354528710246086 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11331626772880554 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11362311989068985 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08580733090639114 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.997875 | time=26.2s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.1135471910238266 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08574097603559494 | |
INFO:birdwatch.matrix_factorization:epoch 180 0.04341157525777817 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11314854770898819 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1135365217924118 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08574177324771881 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 496867, Notes: 1319203 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.11353592574596405 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08574138581752777 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17063555121421814 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 First MF/stable init elapsed time: 85.73 secs (1.43 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFMultiGroupScorer_1 | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.57325521545964 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 126.30901428350042 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.704073 | time=75.2s | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37204861640930176 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3121606111526489 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.395800 | time=52.4s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 200 0.043299153447151184 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11298008263111115 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11100354045629501 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08524340391159058 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.305933 | time=78.6s | |
INFO:birdwatch.matrix_factorization:epoch 220 0.04319792985916138 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11280490458011627 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.96 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Compute scored notes elapsed time: 49.19 secs (0.82 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 6256420 post-tombstones and 1114 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4941544, including 4941537 post-tombstones and 7 pre-tombstones. | |
INFO:birdwatch.matrix_factorization:epoch 240 0.04310733079910278 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11263182014226913 | |
INFO:birdwatch.note_ratings:Total valid ratings: 561970 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Compute valid ratings elapsed time: 8.10 secs (0.14 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Helpfulness scores pre-harassment elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.286150 | time=104.7s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 55432 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 127143 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 50203 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 44219 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5396149 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4469939 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Filtering by helpfulness score elapsed time: 7.44 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2859553 | |
1 256532 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1353854 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2874586, Num Unique Notes Rated: 146011, Num Unique Raters: 41160 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2657541 | |
1 217045 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.07550478573262376 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 12.244193600405445 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 41160, Notes: 146011 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.68746190355521 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 69.83931000971818 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4441781044006348 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.542120337486267 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6947866082191467 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.4026806056499481 | |
INFO:birdwatch.matrix_factorization:epoch 260 0.043023109436035156 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1124739944934845 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.367031 | time=155.2s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4966541528701782 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3342677354812622 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.279499 | time=130.5s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11212554574012756 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08325662463903427 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.46825674176216125 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3246450424194336 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1098654568195343 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08289977163076401 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4643974304199219 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32281479239463806 | |
INFO:birdwatch.matrix_factorization:epoch 280 0.04294116422533989 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11232009530067444 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.46387386322021484 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32255950570106506 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.46387386322021484 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32255950570106506 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.23407363891601562 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Harassment tag consensus elapsed time: 39.94 secs (0.67 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Helpfulness scores post-harassment elapsed time: 1.37 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.276617 | time=156.9s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 55432 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 127143 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 45799 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 39815 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5396149 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3587397 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 39815, Notes: 227080 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.797943456050731 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 90.10164510862741 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3791448175907135 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31287574768066406 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11126857995986938 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08050812780857086 | |
INFO:birdwatch.matrix_factorization:epoch 300 0.042860426008701324 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11217564344406128 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10951262712478638 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0819113552570343 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.275117 | time=182.9s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10821569710969925 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07919583469629288 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10816728323698044 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07924497872591019 | |
INFO:birdwatch.matrix_factorization:epoch 320 0.04278457909822464 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11203478276729584 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10815171897411346 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07901245355606079 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.10815156996250153 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07916077971458435 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17339570820331573 | |
INFO:birdwatch.constants:Final round MF elapsed time: 50.60 secs (0.84 mins) | |
INFO:birdwatch.mf_base_scorer:In MFMultiGroupScorer_1 prescoring, about to call diligence with 3587397 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.470690 | |
1 00018D8DDD8FE5AD262631A9CA08190AB95942067312FD... -0.097652 | |
2 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... 0.019454 | |
3 0003B87251FE6860759A856C73472561F9A37C4813053E... -0.321227 | |
4 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... 0.178776 | |
... ... ... | |
39810 FFF10C79740909DEDBBF234382D89BF3F3D4750C5E983B... 0.231334 | |
39811 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... 0.103689 | |
39812 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... 0.400786 | |
39813 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... 0.312443 | |
39814 FFFE8C4E72CFDBD164D87E0FDA30F8334EC8B6013F1238... 0.348604 | |
[39815 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 39815, vs. num we are initializing: 39815 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 39815 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.491796 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.325250 | time=236.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.274230 | time=209.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.650938 | time=8.2s | |
INFO:birdwatch.matrix_factorization:epoch 340 0.042706988751888275 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11189758777618408 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11096478998661041 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0851963683962822 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.228275 | time=16.3s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10979072749614716 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08270809054374695 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.160544 | time=24.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.273654 | time=235.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.140998 | time=33.2s | |
INFO:birdwatch.matrix_factorization:epoch 360 0.042632780969142914 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11176253110170364 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.132281 | time=41.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.127349 | time=49.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.273250 | time=261.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.6443, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.273239 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.124120 | time=57.7s | |
INFO:birdwatch.matrix_factorization:epoch 380 0.04256343096494675 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11163689196109772 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.121939 | time=65.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.120432 | time=74.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.903910 | time=26.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.119295 | time=82.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7297, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.119263 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.316339 | time=317.7s | |
INFO:birdwatch.matrix_factorization:epoch 400 0.04249825328588486 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11152413487434387 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.992558 | time=8.1s | |
INFO:birdwatch.matrix_factorization:Num epochs: 411 | |
INFO:birdwatch.matrix_factorization:epoch 411 0.042470186948776245 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11146847903728485 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1539715677499771 | |
INFO:birdwatch.scorer:MFGroupScorer_14 First MF/stable init elapsed time: 951.67 secs (15.86 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_14 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.981636 | time=16.1s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10977811366319656 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08255892246961594 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10977811366319656 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08255892246961594 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1098351702094078 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0828741118311882 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17136713862419128 | |
INFO:birdwatch.constants:Final round MF elapsed time: 641.61 secs (10.69 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionPlusScorer prescoring, about to call diligence with 62806087 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.980798 | time=24.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.897679 | time=52.2s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.980763 | time=26.8s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.475615 | time=0.0s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.391722 | time=4.7s | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.390697 | time=9.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.390654 | time=11.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8796, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.1193, 1.9808, 0.3907 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Low Diligence MF elapsed time: 125.16 secs (2.09 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.896501 | time=78.2s | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.896465 | time=86.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.250473 | time=0.1s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.69 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Compute scored notes elapsed time: 55.50 secs (0.93 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.175413 | time=15.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.313438 | time=394.8s | |
INFO:birdwatch.note_ratings:Total ratings: 11421850 post-tombstones and 4020 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 9299407, including 9299136 post-tombstones and 271 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 441611 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Compute valid ratings elapsed time: 14.82 secs (0.25 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Helpfulness scores pre-harassment elapsed time: 1.03 secs (0.02 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.58 secs (0.58 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.174604 | time=30.5s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 60009 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 137625 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 39587 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.174574 | time=35.5s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(3.6029, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.2732, 1.8965, 0.1746 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Low Diligence MF elapsed time: 396.10 secs (6.60 mins) | |
INFO:birdwatch.constants:MFMultiGroupScorer_1: Compute tag thresholds for percentiles elapsed time: 8.88 secs (0.15 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 38298 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 10521432 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 8028229 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Filtering by helpfulness score elapsed time: 13.99 secs (0.23 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10975932329893112 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08267231285572052 | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 4805847 | |
1 278264 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 2944118 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.164351 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.453246 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.710366 | |
3 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.275141 | |
4 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.466115 | |
... ... ... | |
496784 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.040002 | |
496785 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.290216 | |
496786 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... 0.061891 | |
496787 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.215695 | |
496788 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.575421 | |
[496789 rows x 2 columns] | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4694922, Num Unique Notes Rated: 249596, Num Unique Raters: 36766 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 4453472 | |
1 241450 | |
dtype: int64 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.05 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496789, vs. num we are initializing: 496789 | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05142790444654884 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 18.44469662455995 with BCEWithLogitsLoss | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 496789 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.858751 | time=0.7s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 2.11 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 36766, Notes: 249596 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 18.81008509751759 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 127.69738345210249 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.191194534301758 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.328877329826355 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6886484026908875 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3510032594203949 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4418139159679413 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2804913818836212 | |
INFO:birdwatch.constants:MFTopicScorer_GazaConflict: Compute tag thresholds for percentiles elapsed time: 24.59 secs (0.41 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4074273705482483 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26774466037750244 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4025745987892151 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26587173342704773 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.312109 | time=471.4s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4018949866294861 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2655237317085266 | |
INFO:birdwatch.matrix_factorization:Num epochs: 113 | |
INFO:birdwatch.matrix_factorization:epoch 113 0.4018162190914154 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2655215263366699 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.25328749418258667 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Harassment tag consensus elapsed time: 65.76 secs (1.10 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 167: helpfulnessScores = helpfulnessScores.merge( | |
PandasTypeError: Output mismatch on totalHelpfulHarassmentPenalty: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Helpfulness scores post-harassment elapsed time: 2.28 secs (0.04 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 60009 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 137625 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 35909 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1097467839717865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0825260654091835 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.1097467839717865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0825260654091835 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1713782697916031 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 34620 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 10521432 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 6020624 | |
INFO:birdwatch.constants:Final round MF elapsed time: 653.00 secs (10.88 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionScorer prescoring, about to call diligence with 62758781 final round ratings. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 34620, Notes: 397801 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.134763361580287 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 173.9059503177354 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1634453982114792 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.14385278522968292 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.718604 | time=86.9s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12274827063083649 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09218326210975647 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11930923163890839 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08972907811403275 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11864250898361206 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08917440474033356 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.311362 | time=546.6s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11847145110368729 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08894865959882736 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11840848624706268 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08882999420166016 | |
INFO:birdwatch.matrix_factorization:Num epochs: 116 | |
INFO:birdwatch.matrix_factorization:epoch 116 0.11838555335998535 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08875240385532379 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.165260 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.453323 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.710605 | |
3 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.274709 | |
4 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.464849 | |
... ... ... | |
496862 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.038655 | |
496863 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.292151 | |
496864 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... 0.062043 | |
496865 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.215246 | |
496866 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.576884 | |
[496867 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496867, vs. num we are initializing: 496867 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 496867 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.863570 | time=0.8s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 34620, Notes: 397801 | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:epoch 0 0.08779740333557129 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30733391642570496 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.385694 | time=172.9s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.05155905708670616 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12256365269422531 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.04741515964269638 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11213576048612595 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.310950 | time=622.4s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.046249330043792725 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11291010677814484 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.045835163444280624 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11310164630413055 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.04563520848751068 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1133788526058197 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.719646 | time=86.8s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.04551506042480469 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11373230069875717 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.344915 | time=257.9s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.04542715847492218 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11400537192821503 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.045356348156929016 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11423291265964508 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.310689 | time=699.3s | |
INFO:birdwatch.matrix_factorization:Num epochs: 168 | |
INFO:birdwatch.matrix_factorization:epoch 168 0.045335497707128525 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11429942399263382 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1642647087574005 | |
INFO:birdwatch.constants:Final round MF elapsed time: 219.46 secs (3.66 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_14 prescoring, about to call diligence with 6020624 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... 0.114668 | |
1 00032CF270BEF4007D6B24E33135CD078C72B0965FCD8D... -0.863593 | |
2 00054DA8CA53842EE3042D2E203830D7F023E91EC47259... -0.696787 | |
3 000818E860FC3D0209D9E2493FC76B78311313A011891F... -0.411655 | |
4 000A760155A8E91769E9D71F4DE644707CA31B077F0FDC... -0.862553 | |
... ... ... | |
34615 FFFD9C3BC7BB3A78D72C67E34A7BDEFAAFFC485AAE049D... 0.306456 | |
34616 FFFDDE9AE1DFCB76019D1A523D5CC586BB1AB22B878801... 0.374651 | |
34617 FFFF4DD649728988010BBC2B953A59797EA70028B58EA8... -0.651221 | |
34618 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.271115 | |
34619 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.711589 | |
[34620 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 34620, vs. num we are initializing: 34620 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 34620 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.595818 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.578781 | time=14.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.181258 | time=28.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.385156 | time=172.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.115875 | time=42.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.336308 | time=344.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.096240 | time=57.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.310513 | time=777.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0976, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.310509 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.087276 | time=71.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.082028 | time=86.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.078642 | time=100.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.076359 | time=114.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.344345 | time=257.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.074747 | time=129.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.333598 | time=431.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.073623 | time=143.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.4285, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.073592 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.072717 | time=75.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.920358 | time=13.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.908950 | time=27.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.908123 | time=41.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.908097 | time=45.8s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.440956 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.369864 | time=8.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.368972 | time=16.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.368935 | time=20.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.5646, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0736, 1.9081, 0.3689 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Low Diligence MF elapsed time: 217.86 secs (3.63 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.335839 | time=342.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.062488 | time=150.6s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.332457 | time=518.9s | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.78 secs (0.58 mins) | |
INFO:birdwatch.constants:MFGroupScorer_14: Compute tag thresholds for percentiles elapsed time: 18.64 secs (0.31 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.061631 | time=225.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.061631 | time=225.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.333156 | time=427.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.339348 | time=0.7s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.331867 | time=606.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.254511 | time=51.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.332019 | time=512.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.253524 | time=102.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.331526 | time=692.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.253484 | time=128.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.5736, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3105, 2.0616, 0.2535 | |
INFO:birdwatch.scorer:MFCoreScorer Low Diligence MF elapsed time: 1204.16 secs (20.07 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.331436 | time=595.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.331315 | time=778.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.331087 | time=679.1s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.87 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.85 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.331172 | time=861.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.1620, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.331169 | time=0.7s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.41 secs (0.59 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.330859 | time=761.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.091711 | time=81.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.330716 | time=843.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.1645, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.330712 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.081404 | time=162.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.090706 | time=78.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.080528 | time=242.5s | |
INFO:birdwatch.constants:MFCoreScorer: Compute tag thresholds for percentiles elapsed time: 220.11 secs (3.67 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.080499 | time=269.5s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.336091 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.080357 | time=157.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.249141 | time=55.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.079465 | time=235.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.248113 | time=110.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.079435 | time=261.8s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.335744 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.248072 | time=137.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.6540, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3312, 2.0805, 0.2481 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Low Diligence MF elapsed time: 1358.96 secs (22.65 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.249018 | time=58.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.247993 | time=117.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.247952 | time=147.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.6566, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3307, 2.0794, 0.2480 | |
INFO:birdwatch.scorer:MFExpansionScorer Low Diligence MF elapsed time: 1339.42 secs (22.32 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.56 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.54 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 31.31 secs (0.52 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.29 secs (0.57 mins) | |
INFO:birdwatch.constants:MFExpansionPlusScorer: Compute tag thresholds for percentiles elapsed time: 213.45 secs (3.56 mins) | |
INFO:birdwatch.constants:MFExpansionScorer: Compute tag thresholds for percentiles elapsed time: 215.83 secs (3.60 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:Got model results from all scorers. | |
INFO:birdwatch.run_scoring:---- | |
Completed individual scorers. Ran in parallel: True. Succeeded in 5830.41 seconds. | |
Individual scorers: (name, runtime): [('MFCoreScorer', '80.32 mins'), ('MFExpansionScorer', '91.53 mins'), ('MFExpansionPlusScorer', '89.27 mins'), ('ReputationScorer', '35.18 mins'), ('MFGroupScorer_13', '32.03 mins'), ('MFGroupScorer_12', '2.85 mins'), ('MFGroupScorer_11', '3.47 mins'), ('MFGroupScorer_10', '3.10 mins'), ('MFGroupScorer_9', '8.77 mins'), ('MFGroupScorer_8', '2.76 mins'), ('MFGroupScorer_7', '3.75 mins'), ('MFGroupScorer_6', '7.72 mins'), ('MFGroupScorer_5', '2.58 mins'), ('MFGroupScorer_4', '4.10 mins'), ('MFGroupScorer_3', '8.36 mins'), ('MFGroupScorer_2', '3.25 mins'), ('MFGroupScorer_1', '8.34 mins'), ('MFGroupScorer_14', '28.63 mins'), ('MFTopicScorer_Unassigned', '0.19 mins'), ('MFTopicScorer_UkraineConflict', '3.98 mins'), ('MFTopicScorer_GazaConflict', '11.31 mins'), ('MFTopicScorer_MessiRonaldo', '0.68 mins'), ('MFMultiGroupScorer_1', '8.34 mins')] | |
---- | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in combine_prescorer_scorer_results, at line 484: prescoringNoteModelOutput = pd.concat( | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
PandasTypeError: DataFrame concat on noteId: output=object inputs=[dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('O'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalNoteIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on internalNoteFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceNoteIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceNoteFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in combine_prescorer_scorer_results, at line 505: raterParamsUnfilteredMultiScorers = pd.concat( | |
PandasTypeError: DataFrame concat on internalRaterIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on crhCrnhRatioDifference: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on meanNoteScore: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on raterAgreeRatio: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on aboveHelpfulnessThreshold: output=object inputs=[dtype('bool'), dtype('bool'), dtype('bool'), dtype('float64'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('bool')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterReputation: output=float32 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float32'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterReputation: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on incorrectTagRatingsMadeByRater: output=Int64 inputs=[Int64Dtype(), Int64Dtype(), Int64Dtype(), dtype('float64'), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int8Dtype(), Int64Dtype()] (allowed) | |
INFO:birdwatch.run_scoring:notes total RAM: 125118992 bytes (0.125 GB) | |
column dtype RAM | |
0 noteId int64 12670240 | |
1 noteAuthorParticipantId object 12670240 | |
2 createdAtMillis int64 12670240 | |
3 tweetId object 12670240 | |
4 classification object 12670240 | |
5 believable category 1583904 | |
6 harmful category 1583904 | |
7 validationDifficulty category 1583904 | |
8 misleadingOther Int8 3167560 | |
9 misleadingFactualError Int8 3167560 | |
10 misleadingManipulatedMedia Int8 3167560 | |
11 misleadingOutdatedInformation Int8 3167560 | |
12 misleadingMissingImportantContext Int8 3167560 | |
13 misleadingUnverifiedClaimAsFact Int8 3167560 | |
14 misleadingSatire Int8 3167560 | |
15 notMisleadingOther Int8 3167560 | |
16 notMisleadingFactuallyCorrect Int8 3167560 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3167560 | |
18 notMisleadingClearlySatire Int8 3167560 | |
19 notMisleadingPersonalOpinion Int8 3167560 | |
20 trustworthySources Int8 3167560 | |
21 summary object 12670240 | |
22 isMediaNote Int8 3167560 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13424916000 bytes (13.425 GB) | |
column dtype RAM | |
0 noteId int64 967561504 | |
1 raterParticipantId object 967561504 | |
2 createdAtMillis int64 967561504 | |
3 version Int8 241890376 | |
4 agree Int8 241890376 | |
5 disagree Int8 241890376 | |
6 helpful Int8 241890376 | |
7 notHelpful Int8 241890376 | |
8 helpfulnessLevel category 120945320 | |
9 helpfulOther Int8 241890376 | |
10 helpfulInformative Int8 241890376 | |
11 helpfulClear Int8 241890376 | |
12 helpfulEmpathetic Int8 241890376 | |
13 helpfulGoodSources Int8 241890376 | |
14 helpfulUniqueContext Int8 241890376 | |
15 helpfulAddressesClaim Int8 241890376 | |
16 helpfulImportantContext Int8 241890376 | |
17 helpfulUnbiasedLanguage Int8 241890376 | |
18 notHelpfulOther Int8 241890376 | |
19 notHelpfulIncorrect Int8 241890376 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 241890376 | |
21 notHelpfulOpinionSpeculationOrBias Int8 241890376 | |
22 notHelpfulMissingKeyPoints Int8 241890376 | |
23 notHelpfulOutdated Int8 241890376 | |
24 notHelpfulHardToUnderstand Int8 241890376 | |
25 notHelpfulArgumentativeOrBiased Int8 241890376 | |
26 notHelpfulOffTopic Int8 241890376 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 241890376 | |
28 notHelpfulIrrelevantSources Int8 241890376 | |
29 notHelpfulOpinionSpeculation Int8 241890376 | |
30 notHelpfulNoteNotNeeded Int8 241890376 | |
31 ratedOnTweetId int64 967561504 | |
32 helpfulNum float64 967561504 | |
33 postSelectionValue float64 967561504 | |
34 postSelectionValue_note_author float64 967561504 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 230089929 bytes (0.230 GB) | |
column dtype RAM | |
0 noteId int64 14269032 | |
1 noteAuthorParticipantId object 14269032 | |
2 createdAtMillis float64 14269032 | |
3 timestampMillisOfFirstNonNMRStatus float64 14269032 | |
4 firstNonNMRStatus category 1783753 | |
5 timestampMillisOfCurrentStatus float64 14269032 | |
6 currentStatus category 1783761 | |
7 timestampMillisOfLatestNonNMRStatus float64 14269032 | |
8 mostRecentNonNMRStatus category 1783753 | |
9 timestampMillisOfStatusLock float64 14269032 | |
10 lockedStatus category 1783761 | |
11 timestampMillisOfRetroLock float64 14269032 | |
12 currentCoreStatus category 1783761 | |
13 currentExpansionStatus category 1783761 | |
14 currentGroupStatus category 1783761 | |
15 currentDecidedBy category 1784377 | |
16 currentModelingGroup float64 14269032 | |
17 timestampMillisOfMostRecentStatusChange float64 14269032 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14269032 | |
19 currentMultiGroupStatus category 1783761 | |
20 currentModelingMultiGroup float64 14269032 | |
21 timestampMinuteOfFinalScoringOutput float64 14269032 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14269032 | |
23 classification object 14269032 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 60314631 bytes (0.060 GB) | |
column dtype RAM | |
0 participantId object 8465208 | |
1 enrollmentState object 8465208 | |
2 successfulRatingNeededToEarnIn int64 8465208 | |
3 timestampOfLastStateChange int64 8465208 | |
4 timestampOfLastEarnOut float64 8465208 | |
5 modelingPopulation category 1058175 | |
6 modelingGroup float64 8465208 | |
7 numberOfTimesEarnedOut int64 8465208 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 293538924 bytes (0.294 GB) | |
column dtype RAM | |
0 noteId object 65230872 | |
1 internalNoteIntercept float32 32615436 | |
2 internalNoteFactor1 float32 32615436 | |
3 scorerName object 65230872 | |
4 lowDiligenceNoteIntercept float32 32615436 | |
5 lowDiligenceNoteFactor1 float32 32615436 | |
6 lowDiligenceNoteInterceptRound2 float32 32615436 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 407235030 bytes (0.407 GB) | |
column dtype RAM | |
0 raterParticipantId object 32256240 | |
1 internalRaterIntercept float32 16128120 | |
2 internalRaterFactor1 float32 16128120 | |
3 crhCrnhRatioDifference float64 32256240 | |
4 meanNoteScore float64 32256240 | |
5 raterAgreeRatio float64 32256240 | |
6 aboveHelpfulnessThreshold object 32256240 | |
7 scorerName object 32256240 | |
8 internalRaterReputation float32 16128120 | |
9 lowDiligenceRaterIntercept float32 16128120 | |
10 lowDiligenceRaterFactor1 float32 16128120 | |
11 lowDiligenceRaterReputation float32 16128120 | |
12 lowDiligenceRaterInterceptRound2 float32 16128120 | |
13 incorrectTagRatingsMadeByRater Int64 36288270 | |
14 totalRatingsMadeByRater float64 32256240 | |
15 postSelectionValue float64 32256240 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (before conversion) elapsed time: 0.06 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:notes total RAM: 125118992 bytes (0.125 GB) | |
column dtype RAM | |
0 noteId int64 12670240 | |
1 noteAuthorParticipantId object 12670240 | |
2 createdAtMillis int64 12670240 | |
3 tweetId object 12670240 | |
4 classification object 12670240 | |
5 believable category 1583904 | |
6 harmful category 1583904 | |
7 validationDifficulty category 1583904 | |
8 misleadingOther Int8 3167560 | |
9 misleadingFactualError Int8 3167560 | |
10 misleadingManipulatedMedia Int8 3167560 | |
11 misleadingOutdatedInformation Int8 3167560 | |
12 misleadingMissingImportantContext Int8 3167560 | |
13 misleadingUnverifiedClaimAsFact Int8 3167560 | |
14 misleadingSatire Int8 3167560 | |
15 notMisleadingOther Int8 3167560 | |
16 notMisleadingFactuallyCorrect Int8 3167560 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3167560 | |
18 notMisleadingClearlySatire Int8 3167560 | |
19 notMisleadingPersonalOpinion Int8 3167560 | |
20 trustworthySources Int8 3167560 | |
21 summary object 12670240 | |
22 isMediaNote Int8 3167560 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13424916000 bytes (13.425 GB) | |
column dtype RAM | |
0 noteId int64 967561504 | |
1 raterParticipantId object 967561504 | |
2 createdAtMillis int64 967561504 | |
3 version Int8 241890376 | |
4 agree Int8 241890376 | |
5 disagree Int8 241890376 | |
6 helpful Int8 241890376 | |
7 notHelpful Int8 241890376 | |
8 helpfulnessLevel category 120945320 | |
9 helpfulOther Int8 241890376 | |
10 helpfulInformative Int8 241890376 | |
11 helpfulClear Int8 241890376 | |
12 helpfulEmpathetic Int8 241890376 | |
13 helpfulGoodSources Int8 241890376 | |
14 helpfulUniqueContext Int8 241890376 | |
15 helpfulAddressesClaim Int8 241890376 | |
16 helpfulImportantContext Int8 241890376 | |
17 helpfulUnbiasedLanguage Int8 241890376 | |
18 notHelpfulOther Int8 241890376 | |
19 notHelpfulIncorrect Int8 241890376 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 241890376 | |
21 notHelpfulOpinionSpeculationOrBias Int8 241890376 | |
22 notHelpfulMissingKeyPoints Int8 241890376 | |
23 notHelpfulOutdated Int8 241890376 | |
24 notHelpfulHardToUnderstand Int8 241890376 | |
25 notHelpfulArgumentativeOrBiased Int8 241890376 | |
26 notHelpfulOffTopic Int8 241890376 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 241890376 | |
28 notHelpfulIrrelevantSources Int8 241890376 | |
29 notHelpfulOpinionSpeculation Int8 241890376 | |
30 notHelpfulNoteNotNeeded Int8 241890376 | |
31 ratedOnTweetId int64 967561504 | |
32 helpfulNum float64 967561504 | |
33 postSelectionValue float64 967561504 | |
34 postSelectionValue_note_author float64 967561504 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 230089929 bytes (0.230 GB) | |
column dtype RAM | |
0 noteId int64 14269032 | |
1 noteAuthorParticipantId object 14269032 | |
2 createdAtMillis float64 14269032 | |
3 timestampMillisOfFirstNonNMRStatus float64 14269032 | |
4 firstNonNMRStatus category 1783753 | |
5 timestampMillisOfCurrentStatus float64 14269032 | |
6 currentStatus category 1783761 | |
7 timestampMillisOfLatestNonNMRStatus float64 14269032 | |
8 mostRecentNonNMRStatus category 1783753 | |
9 timestampMillisOfStatusLock float64 14269032 | |
10 lockedStatus category 1783761 | |
11 timestampMillisOfRetroLock float64 14269032 | |
12 currentCoreStatus category 1783761 | |
13 currentExpansionStatus category 1783761 | |
14 currentGroupStatus category 1783761 | |
15 currentDecidedBy category 1784377 | |
16 currentModelingGroup float64 14269032 | |
17 timestampMillisOfMostRecentStatusChange float64 14269032 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14269032 | |
19 currentMultiGroupStatus category 1783761 | |
20 currentModelingMultiGroup float64 14269032 | |
21 timestampMinuteOfFinalScoringOutput float64 14269032 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14269032 | |
23 classification object 14269032 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 60314631 bytes (0.060 GB) | |
column dtype RAM | |
0 participantId object 8465208 | |
1 enrollmentState object 8465208 | |
2 successfulRatingNeededToEarnIn int64 8465208 | |
3 timestampOfLastStateChange int64 8465208 | |
4 timestampOfLastEarnOut float64 8465208 | |
5 modelingPopulation category 1058175 | |
6 modelingGroup float64 8465208 | |
7 numberOfTimesEarnedOut int64 8465208 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 293538924 bytes (0.294 GB) | |
column dtype RAM | |
0 noteId object 65230872 | |
1 internalNoteIntercept float32 32615436 | |
2 internalNoteFactor1 float32 32615436 | |
3 scorerName object 65230872 | |
4 lowDiligenceNoteIntercept float32 32615436 | |
5 lowDiligenceNoteFactor1 float32 32615436 | |
6 lowDiligenceNoteInterceptRound2 float32 32615436 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 407235030 bytes (0.407 GB) | |
column dtype RAM | |
0 raterParticipantId object 32256240 | |
1 internalRaterIntercept float32 16128120 | |
2 internalRaterFactor1 float32 16128120 | |
3 crhCrnhRatioDifference float64 32256240 | |
4 meanNoteScore float64 32256240 | |
5 raterAgreeRatio float64 32256240 | |
6 aboveHelpfulnessThreshold object 32256240 | |
7 scorerName object 32256240 | |
8 internalRaterReputation float32 16128120 | |
9 lowDiligenceRaterIntercept float32 16128120 | |
10 lowDiligenceRaterFactor1 float32 16128120 | |
11 lowDiligenceRaterReputation float32 16128120 | |
12 lowDiligenceRaterInterceptRound2 float32 16128120 | |
13 incorrectTagRatingsMadeByRater Int64 36288270 | |
14 totalRatingsMadeByRater float64 32256240 | |
15 postSelectionValue float64 32256240 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (after conversion) elapsed time: 0.05 secs (0.00 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in run_prescoring, at line 1187: prescoringRaterModelOutput = pd.concat( | |
PandasTypeError: DataFrame concat on postSelectionValue: output=float64 inputs=[dtype('float64'), dtype('int64')] (allowed) | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 407339161 bytes (0.407 GB) | |
column dtype RAM | |
0 raterParticipantId object 32264488 | |
1 internalRaterIntercept float32 16132244 | |
2 internalRaterFactor1 float32 16132244 | |
3 crhCrnhRatioDifference float64 32264488 | |
4 meanNoteScore float64 32264488 | |
5 raterAgreeRatio float64 32264488 | |
6 aboveHelpfulnessThreshold object 32264488 | |
7 scorerName object 32264488 | |
8 internalRaterReputation float32 16132244 | |
9 lowDiligenceRaterIntercept float32 16132244 | |
10 lowDiligenceRaterFactor1 float32 16132244 | |
11 lowDiligenceRaterReputation float32 16132244 | |
12 lowDiligenceRaterInterceptRound2 float32 16132244 | |
13 incorrectTagRatingsMadeByRater Int64 36297549 | |
14 totalRatingsMadeByRater float64 32264488 | |
15 postSelectionValue float64 32264488 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (after concatenation) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:Initial value of OPENBLAS_NUM_THREADS: None | |
INFO:birdwatch.run_scoring:New value of OPENBLAS_NUM_THREADS: 1 | |
INFO:birdwatch.pflip_model:seeding pflip: 0 | |
INFO:birdwatch.pflip_model:total ratings considered for pflip model: 120945188 | |
INFO:birdwatch.pflip_model:total ratings before initial note status for pflip model: 99035137 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/pflip_model.py, in _get_recent_notes, at line 303: noteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (UNALLOWED) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (UNALLOWED) | |
INFO:birdwatch.pflip_model:labels before ScoringDriftGuard: | |
LABEL | |
CRH 126611 | |
FLIP 51470 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:labels after ScoringDriftGuard: | |
LABEL | |
CRH 107480 | |
FLIP 51470 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:labels after restricting to recent notes: | |
LABEL | |
CRH 76098 | |
FLIP 34036 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:total ratings included in pflip model: 6994350 | |
INFO:birdwatch.pflip_model:noteInfo summary: e2c24b68b25a9bbff699622f6a89a560945562677f9c085e5dd585d6e6590771 | |
INFO:birdwatch.pflip_model:pflip training data size: 99120 | |
INFO:birdwatch.pflip_model:trainDataFrame summary: 498bd614900c763a5f6fd9998411d7751c26971b9ea540baa3b8f843d57913b7 | |
INFO:birdwatch.pflip_model:pflip validation data size: 11014 | |
INFO:birdwatch.pflip_model:validationDataFrame summary: e7eb501363bac44ad90adb6a363d1d0c0782a50722ca1e9e196120c0b16c474e | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/feature_extraction/text.py:525: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None' | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/feature_extraction/text.py:525: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None' | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 0 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 6 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 7 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 8 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 9 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 10 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 11 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 12 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 13 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 14 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 15 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 1 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 2 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 4 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 5 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 7 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 8 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 9 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 10 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 12 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 13 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 15 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 16 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 17 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 18 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 19 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 20 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 21 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 22 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 23 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 24 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 25 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 26 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 27 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 28 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 29 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 30 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 31 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 33 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 34 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 35 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 36 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 37 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 38 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 39 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 40 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 41 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 42 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 43 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 44 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/communitynotes/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:239: FutureWarning: In version 1.5 onwards, subsample=200_000 will be used by default. Set subsample explicitly to silence this warning in the mean time. Set subsample=None to disable subsampling explicitly. | |
warnings.warn( | |
INFO:birdwatch.pflip_model:Training Results: | |
INFO:birdwatch.pflip_model:threshold=-7.586273318203936 tpr=0.7354495628343991 fpr=0.25001460536308934 auc=0.8297859913674852 | |
INFO:birdwatch.pflip_model:Validation Results: | |
INFO:birdwatch.pflip_model:threshold=-7.586273318203936 tpr=0.6991725768321513 fpr=0.26251638269986893 auc=0.7994188595472024 | |
INFO:birdwatch.run_scoring:Final value of OPENBLAS_NUM_THREADS: None | |
INFO:birdwatch.constants:Fitting pflip model elapsed time: 387.40 secs (6.46 mins) | |
INFO:birdwatch.run_scoring:We invoked run_scoring and are now in between prescoring and scoring. | |
INFO:birdwatch.run_scoring:Starting final scoring | |
INFO:birdwatch.run_scoring:notes total RAM: 125118992 bytes (0.125 GB) | |
column dtype RAM | |
0 noteId int64 12670240 | |
1 noteAuthorParticipantId object 12670240 | |
2 createdAtMillis int64 12670240 | |
3 tweetId object 12670240 | |
4 classification object 12670240 | |
5 believable category 1583904 | |
6 harmful category 1583904 | |
7 validationDifficulty category 1583904 | |
8 misleadingOther Int8 3167560 | |
9 misleadingFactualError Int8 3167560 | |
10 misleadingManipulatedMedia Int8 3167560 | |
11 misleadingOutdatedInformation Int8 3167560 | |
12 misleadingMissingImportantContext Int8 3167560 | |
13 misleadingUnverifiedClaimAsFact Int8 3167560 | |
14 misleadingSatire Int8 3167560 | |
15 notMisleadingOther Int8 3167560 | |
16 notMisleadingFactuallyCorrect Int8 3167560 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3167560 | |
18 notMisleadingClearlySatire Int8 3167560 | |
19 notMisleadingPersonalOpinion Int8 3167560 | |
20 trustworthySources Int8 3167560 | |
21 summary object 12670240 | |
22 isMediaNote Int8 3167560 | |
INFO:birdwatch.run_scoring:ratings total RAM: 11549335002 bytes (11.549 GB) | |
column dtype RAM | |
0 noteId int64 972575568 | |
1 raterParticipantId object 972575568 | |
2 createdAtMillis int64 972575568 | |
3 version Int8 243143892 | |
4 agree Int8 243143892 | |
5 disagree Int8 243143892 | |
6 helpful Int8 243143892 | |
7 notHelpful Int8 243143892 | |
8 helpfulnessLevel category 121572078 | |
9 helpfulOther Int8 243143892 | |
10 helpfulInformative Int8 243143892 | |
11 helpfulClear Int8 243143892 | |
12 helpfulEmpathetic Int8 243143892 | |
13 helpfulGoodSources Int8 243143892 | |
14 helpfulUniqueContext Int8 243143892 | |
15 helpfulAddressesClaim Int8 243143892 | |
16 helpfulImportantContext Int8 243143892 | |
17 helpfulUnbiasedLanguage Int8 243143892 | |
18 notHelpfulOther Int8 243143892 | |
19 notHelpfulIncorrect Int8 243143892 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 243143892 | |
21 notHelpfulOpinionSpeculationOrBias Int8 243143892 | |
22 notHelpfulMissingKeyPoints Int8 243143892 | |
23 notHelpfulOutdated Int8 243143892 | |
24 notHelpfulHardToUnderstand Int8 243143892 | |
25 notHelpfulArgumentativeOrBiased Int8 243143892 | |
26 notHelpfulOffTopic Int8 243143892 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 243143892 | |
28 notHelpfulIrrelevantSources Int8 243143892 | |
29 notHelpfulOpinionSpeculation Int8 243143892 | |
30 notHelpfulNoteNotNeeded Int8 243143892 | |
31 ratedOnTweetId int64 972575568 | |
32 helpfulNum float64 972575568 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 230089929 bytes (0.230 GB) | |
column dtype RAM | |
0 noteId int64 14269032 | |
1 noteAuthorParticipantId object 14269032 | |
2 createdAtMillis float64 14269032 | |
3 timestampMillisOfFirstNonNMRStatus float64 14269032 | |
4 firstNonNMRStatus category 1783753 | |
5 timestampMillisOfCurrentStatus float64 14269032 | |
6 currentStatus category 1783761 | |
7 timestampMillisOfLatestNonNMRStatus float64 14269032 | |
8 mostRecentNonNMRStatus category 1783753 | |
9 timestampMillisOfStatusLock float64 14269032 | |
10 lockedStatus category 1783761 | |
11 timestampMillisOfRetroLock float64 14269032 | |
12 currentCoreStatus category 1783761 | |
13 currentExpansionStatus category 1783761 | |
14 currentGroupStatus category 1783761 | |
15 currentDecidedBy category 1784377 | |
16 currentModelingGroup float64 14269032 | |
17 timestampMillisOfMostRecentStatusChange float64 14269032 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14269032 | |
19 currentMultiGroupStatus category 1783761 | |
20 currentModelingMultiGroup float64 14269032 | |
21 timestampMinuteOfFinalScoringOutput float64 14269032 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14269032 | |
23 classification object 14269032 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 60314631 bytes (0.060 GB) | |
column dtype RAM | |
0 participantId object 8465208 | |
1 enrollmentState object 8465208 | |
2 successfulRatingNeededToEarnIn int64 8465208 | |
3 timestampOfLastStateChange int64 8465208 | |
4 timestampOfLastEarnOut float64 8465208 | |
5 modelingPopulation category 1058175 | |
6 modelingGroup float64 8465208 | |
7 numberOfTimesEarnedOut int64 8465208 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 293538924 bytes (0.294 GB) | |
column dtype RAM | |
0 noteId object 65230872 | |
1 internalNoteIntercept float32 32615436 | |
2 internalNoteFactor1 float32 32615436 | |
3 scorerName object 65230872 | |
4 lowDiligenceNoteIntercept float32 32615436 | |
5 lowDiligenceNoteFactor1 float32 32615436 | |
6 lowDiligenceNoteInterceptRound2 float32 32615436 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 407339161 bytes (0.407 GB) | |
column dtype RAM | |
0 raterParticipantId object 32264488 | |
1 internalRaterIntercept float32 16132244 | |
2 internalRaterFactor1 float32 16132244 | |
3 crhCrnhRatioDifference float64 32264488 | |
4 meanNoteScore float64 32264488 | |
5 raterAgreeRatio float64 32264488 | |
6 aboveHelpfulnessThreshold object 32264488 | |
7 scorerName object 32264488 | |
8 internalRaterReputation float32 16132244 | |
9 lowDiligenceRaterIntercept float32 16132244 | |
10 lowDiligenceRaterFactor1 float32 16132244 | |
11 lowDiligenceRaterReputation float32 16132244 | |
12 lowDiligenceRaterInterceptRound2 float32 16132244 | |
13 incorrectTagRatingsMadeByRater Int64 36297549 | |
14 totalRatingsMadeByRater float64 32264488 | |
15 postSelectionValue float64 32264488 | |
INFO:birdwatch.constants:Logging Final Scoring RAM usage elapsed time: 0.06 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:No previous scored notes passed; scoring all notes. | |
INFO:birdwatch.run_scoring:2. Rescore all recently created notes if not rescored at the minimum frequency. | |
INFO:birdwatch.run_scoring:Num notes created recently: 41255 | |
INFO:birdwatch.run_scoring:3. Rescore all notes that flipped status in the previous scoring run. 33 | |
INFO:birdwatch.run_scoring:4. Rescore all recently-flipped notes if not rescored at the minimum frequency. | |
INFO:birdwatch.run_scoring:Num notes flipped recently: 0 | |
INFO:birdwatch.run_scoring:Num notes not rescored recently enough: 1721496 | |
INFO:birdwatch.run_scoring:5. Rescore all notes that were NMRed due to MinStableCrhTime was not met. 22 | |
INFO:birdwatch.run_scoring:6. Rescore recent unlocked notes that are eligible for locking 11360 | |
INFO:birdwatch.run_scoring:---- | |
Notes to rescore: | |
* 0 notes with new ratings since last scoring run. | |
* 36375 notes created recently and not rescored recently enough. | |
* 33 notes that flipped status in the previous scoring run. | |
* 0 notes that flipped status recently and not rescored recently enough. | |
* 22 notes that were NMRed due to MinStableCrhTime was not met. | |
* 11360 recent notes that are eligible to lock but haven't locked yet. | |
Overall: 47736 notes to rescore, out of 1583780 total. | |
---- | |
INFO:birdwatch.constants:Determine which notes to score. elapsed time: 0.08 secs (0.00 mins) | |
INFO:birdwatch.process_data:Timestamp of latest rating in data: 2025-01-12 01:03:22.523000 | |
INFO:birdwatch.process_data:Timestamp of latest note in data: 2025-01-12 01:02:59.773000 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Output mismatch on createdAtMillis_notes: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_status_history:total notes added to noteStatusHistory: 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 57: newNoteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/process_data.py, in _filter_misleading_notes, at line 270: ratings = ratings.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.process_data:Preprocess Data: Filter misleading notes, starting with 121571946 ratings on 1591576 notes | |
INFO:birdwatch.process_data: Keeping 87726864 ratings on 1071361 misleading notes | |
INFO:birdwatch.process_data: Keeping 8970460 ratings on 152922 deleted notes that were previously scored (in note status history) | |
INFO:birdwatch.process_data: Removing 0 ratings on 0 older notes that aren't deleted, but are not-misleading. | |
INFO:birdwatch.process_data: Removing 0 ratings on 0 notes that were deleted and not in note status history (e.g. old). | |
INFO:birdwatch.process_data:Num Ratings: 121571946, Num Unique Notes Rated: 1591576, Num Unique Raters: 1057435 | |
INFO:birdwatch.constants:Preprocess smaller dataset since we skipped preprocessing at read time elapsed time: 477.86 secs (7.96 mins) | |
INFO:birdwatch.topic_model:Assigning notes to topics: | |
INFO:birdwatch.constants:Get Note Topics: Predict elapsed time: 81.74 secs (1.36 mins) | |
INFO:birdwatch.topic_model: Notes unassigned due to multiple matches: 1737 | |
INFO:birdwatch.constants:Get Note Topics: Make Seed Labels elapsed time: 83.26 secs (1.39 mins) | |
INFO:birdwatch.topic_model: Post Topic assignment results: [908706 26730 54332 2365] | |
INFO:birdwatch.topic_model: Note Topic assignment results: | |
noteTopic | |
GazaConflict 112514 | |
UkraineConflict 45735 | |
MessiRonaldo 4054 | |
Name: count, dtype: int64 | |
INFO:birdwatch.constants:Get Note Topics: Merge and assign predictions elapsed time: 1.74 secs (0.03 mins) | |
INFO:birdwatch.constants:Note Topic Assignment elapsed time: 185.87 secs (3.10 mins) | |
INFO:birdwatch.run_scoring:Post Selection Similarity Final Scoring: begin with 121571946 ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:111: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.sort_values( | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:114: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.drop_duplicates( | |
INFO:birdwatch.run_scoring:Post Selection Similarity Final Scoring: 120945188 ratings remaining. | |
INFO:birdwatch.constants:Post Selection Similarity: Final Scoring elapsed time: 300.55 secs (5.01 mins) | |
INFO:birdwatch.run_scoring:Starting parallel scorer execution with 23 scorers. | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_13 run_scorer_parallelizable: Loading data elapsed time: 31.17 secs (0.52 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_13 set to: 8 | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:ReputationScorer run_scorer_parallelizable: Loading data elapsed time: 31.80 secs (0.53 mins) | |
INFO:birdwatch.scorer:score_final: Torch intra-op parallelism for ReputationScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionScorer run_scorer_parallelizable: Loading data elapsed time: 32.06 secs (0.53 mins) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_13. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFExpansionScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_12 run_scorer_parallelizable: Loading data elapsed time: 31.97 secs (0.53 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_12 set to: 4 | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionPlusScorer run_scorer_parallelizable: Loading data elapsed time: 32.26 secs (0.54 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFExpansionPlusScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFCoreScorer run_scorer_parallelizable: Loading data elapsed time: 32.51 secs (0.54 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFCoreScorer set to: 12 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_12. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for ReputationScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionPlusScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer:Filtering ratings for MFCoreScorer. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 787651 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filter input elapsed time: 42.24 secs (0.70 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 475575, Num Unique Notes Rated: 32871, Num Unique Raters: 11476 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Prepare ratings elapsed time: 0.28 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 4886 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 23120 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5556 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4886 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 322927 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 322927 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4886, Notes: 32833 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 9.835439953705114 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 66.09230454359394 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 10.068276252833039 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15474192798137665 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10589989274740219 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10260923951864243 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06657733023166656 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09916044771671295 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06444774568080902 | |
INFO:birdwatch.matrix_factorization:Num epochs: 51 | |
INFO:birdwatch.matrix_factorization:epoch 51 0.09894904494285583 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06454918533563614 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18792936205863953 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Final helpfulness-filtered MF elapsed time: 3.22 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_12 final scoring, about to call diligence with 322927 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1691896445111087535 ... -1.836983 | |
1 1710861026604834963 ... -0.291341 | |
2 1710982119822852314 ... 4.525461 | |
3 1712851765647876276 ... -2.326732 | |
4 1712851975610487188 ... -0.918698 | |
... ... ... ... | |
31763 1857577215380070683 ... -0.352948 | |
31764 1864044809939243091 ... -0.294801 | |
31765 1714417073764421719 ... -0.432268 | |
31766 1828127526306169170 ... -0.502986 | |
31767 1873505271571488803 ... -0.661702 | |
[31768 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... ... NaN | |
1 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... ... NaN | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... ... NaN | |
3 000957CF1421B543AEAFEBF835033D3BA5FB1B99FB0AF8... ... NaN | |
4 001041D12A03F39CCB40BEA9458C469323254EEC76348B... ... -0.217365 | |
... ... ... ... | |
23115 FFE87CF4860C52665B228E9F345BB3EE183994416FA6D7... ... NaN | |
23116 FFEEE02BCED1134EB1C57875779C03F2135B72BB4C8E7F... ... 0.393743 | |
23117 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... ... NaN | |
23118 FFFA40CBF0CC13E71072BFE89E80372A5907BD9D2EDA54... ... NaN | |
23119 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... ... NaN | |
[23120 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4886, vs. num we are initializing: 23120 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4886 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4886, vs. num we are initializing: 23120 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4886 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4886, vs. num we are initializing: 23120 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 4886 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 32833, vs. num we are initializing: 31768 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 32233 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 600 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 32833, vs. num we are initializing: 31768 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 32233 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 600 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.909509 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.651407 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.619618 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.616207 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.615797 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.615681 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=165 | loss=2.615649 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4886, vs. num we are initializing: 23120 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4886 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.546920 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.484418 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.483370 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.483336 | time=0.8s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4833 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_12 Low Diligence Reputation Model elapsed time: 3.04 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_12 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer: Ratings after group filter: 35923731 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filter input elapsed time: 54.44 secs (0.91 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 120945188 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filter input elapsed time: 54.66 secs (0.91 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.54 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.74 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.02 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.68 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 104368644 | |
INFO:birdwatch.scorer:ReputationScorer Filter input elapsed time: 65.37 secs (1.09 mins) | |
INFO:birdwatch.reputation_scorer:seeding with 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 104368644 | |
INFO:birdwatch.scorer:MFCoreScorer Filter input elapsed time: 65.25 secs (1.09 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1781732 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 903 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 564 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.96 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.56 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 120942984 | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.92 secs (0.02 mins) | |
INFO:birdwatch.scorer:MFExpansionScorer Filter input elapsed time: 69.33 secs (1.16 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.52 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 174 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.91 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.58 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 35144690, Num Unique Notes Rated: 621125, Num Unique Raters: 220896 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Prepare ratings elapsed time: 21.55 secs (0.36 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 7346 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.92 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.56 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 20 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.39 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 104455 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 226673 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 111397 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.60 secs (0.66 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_12 Final compute scored notes elapsed time: 69.30 secs (1.15 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_12 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 104455 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 19432602 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 19432602 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 103691510, Num Unique Notes Rated: 1229018, Num Unique Raters: 795028 | |
INFO:birdwatch.scorer:MFCoreScorer Prepare ratings elapsed time: 60.18 secs (1.00 mins) | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 120335836, Num Unique Notes Rated: 1323087, Num Unique Raters: 1057223 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare ratings elapsed time: 71.63 secs (1.19 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 104455, Notes: 620077 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 31.339014348218043 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 186.0380259441865 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 31.414247069723523 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12243042886257172 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09816668182611465 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 120333624, Num Unique Notes Rated: 1323084, Num Unique Raters: 1057188 | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare ratings elapsed time: 72.53 secs (1.21 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 102895565, Num Unique Notes Rated: 1227415, Num Unique Raters: 599301 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10489216446876526 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07929407805204391 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 5006 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Postprocess output elapsed time: 56.61 secs (0.94 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10382652282714844 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07794012129306793 | |
INFO:birdwatch.matrix_factorization:Num epochs: 56 | |
INFO:birdwatch.matrix_factorization:epoch 56 0.10370488464832306 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0778801366686821 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1562773585319519 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Final helpfulness-filtered MF elapsed time: 83.62 secs (1.39 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_13 final scoring, about to call diligence with 19432602 final round ratings. | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_11 run_scorer_parallelizable: Loading data elapsed time: 27.14 secs (0.45 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_11 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_11. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1549781045201047554 ... 0.534424 | |
1 1592925068132245504 ... -0.290130 | |
2 1593079642092617729 ... 1.113199 | |
3 1595167355637796876 ... 0.190297 | |
4 1597230938316054532 ... -1.577253 | |
... ... ... ... | |
618452 1829947533667299426 ... -0.269345 | |
618453 1663589142351970305 ... -0.305630 | |
618454 1741105701643268121 ... 1.320487 | |
618455 1694299885778981367 ... -0.170205 | |
618456 1783727968046432453 ... -0.206472 | |
[618457 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.476654 | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... ... NaN | |
2 00022C96980039352E2D04B5E533090FA8BA333F87C5EB... ... 0.232833 | |
3 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... ... NaN | |
4 000274A83456E40A03B81628F432D06A3506E28C77FEA8... ... NaN | |
... ... ... ... | |
226668 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... ... NaN | |
226669 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... ... 0.317591 | |
226670 FFFF0C7BF4089C6436CAB332B309A1A81C21E11CD61CE4... ... NaN | |
226671 FFFF3B1E5FB7927B196BCC7753E5CE5B2E64AFA90099E0... ... NaN | |
226672 FFFF7E0B3ADB6FC5FB42B0F01FFD24495410C1AE4AC986... ... -0.162698 | |
[226673 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 104455, vs. num we are initializing: 226673 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 104455 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 104455, vs. num we are initializing: 226673 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 104455 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 104455, vs. num we are initializing: 226673 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 104455 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 382560 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 582446 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 413599 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 620077, vs. num we are initializing: 618457 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 581274 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 38803 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 620077, vs. num we are initializing: 618457 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 581274 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 38803 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=4.512995 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.436904 | time=11.8s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 496789 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722512 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 549629 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1352796878438424576 ... -0.059589 | |
1 1353415873227177985 ... 0.047713 | |
2 1354586938863443971 ... NaN | |
3 1354588003075764229 ... NaN | |
4 1354588172659920899 ... NaN | |
... ... ... ... | |
1783624 1878254806986281330 ... NaN | |
1783625 1878254878893674977 ... NaN | |
1783626 1878255194250576094 ... NaN | |
1783627 1878255394629542201 ... NaN | |
1783628 1878255526393344046 ... NaN | |
[1783629 rows x 7 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 0000010BB832A9CFDF102BF7B66896FA987C80FBB61EF6... ... 0.126661 | |
1 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... 0.086200 | |
2 0000315D36021A528D85155729DDBF2E299BB8C3040878... ... 0.142033 | |
3 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.104796 | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... 0.169990 | |
... ... ... ... | |
599296 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... ... 0.183009 | |
599297 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... -0.087869 | |
599298 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.195781 | |
599299 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... -0.394508 | |
599300 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... -0.017513 | |
[599301 rows x 16 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 599301, vs. num we are initializing: 599301 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 599301 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 599301, vs. num we are initializing: 599301 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.410068 | time=23.1s | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 599301 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 599301, vs. num we are initializing: 599301 | |
INFO:birdwatch.scorer: Ratings after group filter: 1761412 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filter input elapsed time: 46.88 secs (0.78 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1135370, Num Unique Notes Rated: 94421, Num Unique Raters: 11899 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Prepare ratings elapsed time: 0.55 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 599301 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6579 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 49542 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7158 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 6579 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 741462 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 741462 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6579, Notes: 94215 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.869893329087725 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 112.70132238942088 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 7.93073593073593 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.13474147021770477 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1061607077717781 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1227415, vs. num we are initializing: 1783629 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10054232180118561 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06872962415218353 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1170798 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 56617 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 382560 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 53274194 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 53274194 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.0989188700914383 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06740477681159973 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09868934005498886 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0671992152929306 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1227415, vs. num we are initializing: 1783629 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.0986587405204773 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06717830896377563 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1170798 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 56617 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.matrix_factorization:Num epochs: 93 | |
INFO:birdwatch.matrix_factorization:epoch 93 0.09865555912256241 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06717663258314133 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16698786616325378 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Final helpfulness-filtered MF elapsed time: 5.29 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_11 final scoring, about to call diligence with 741462 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1643057880793325568 2.306396 3.599517 | |
1 1660244070621380610 0.521022 -2.256069 | |
2 1681713296271892480 -7.717273 -0.095602 | |
3 1686264916682919936 1.750654 -1.210147 | |
4 1686753388883546113 -0.222489 2.320199 | |
... ... ... ... | |
93088 1819830116945387728 -0.440204 -0.854928 | |
93089 1819849846179627161 -0.441013 -0.854774 | |
93090 1819925754207150395 -0.440286 -0.854607 | |
93091 1870385083439513634 2.043123 2.935036 | |
93092 1714789934614098338 -0.236172 0.939513 | |
internalNoteInterceptRound2 | |
0 2.306396 | |
1 0.521022 | |
2 -7.717273 | |
3 1.750654 | |
4 -0.222489 | |
... ... | |
93088 -0.440204 | |
93089 -0.441013 | |
93090 -0.440286 | |
93091 2.043123 | |
93092 -0.236172 | |
[93093 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
3 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
4 0002D1E11A8EA1E4B25048FA9D117406CE9EB1D3143BC9... | |
... ... | |
49537 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... | |
49538 FFFA49720F254411E1F79CA757C403F0A0217240BC4922... | |
49539 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... | |
49540 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
49541 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
49537 NaN NaN NaN | |
49538 0.068667 0.657025 0.783342 | |
49539 NaN NaN NaN | |
49540 NaN NaN NaN | |
49541 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
49537 NaN | |
49538 0.068667 | |
49539 NaN | |
49540 NaN | |
49541 NaN | |
[49542 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6579, vs. num we are initializing: 49542 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 6579 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6579, vs. num we are initializing: 49542 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 6579 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6579, vs. num we are initializing: 49542 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 6579 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 94215, vs. num we are initializing: 93093 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 91655 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 2560 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 94215, vs. num we are initializing: 93093 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 91655 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 2560 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.327492 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.719859 | time=0.7s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 496867 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 722844 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 549781 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.205070 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.408242 | time=34.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.672142 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.664829 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.663314 | time=2.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.662726 | time=3.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.662431 | time=4.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.662268 | time=4.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.662173 | time=5.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=255 | loss=2.662141 | time=5.8s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6579, vs. num we are initializing: 49542 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 6579 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.570991 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.475896 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.474311 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.474262 | time=1.6s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4743 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_11 Low Diligence Reputation Model elapsed time: 8.76 secs (0.15 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_11 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=110 | loss=2.408158 | time=42.7s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 104455, vs. num we are initializing: 226673 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 104455 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.272076 | time=0.2s | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 3.64 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.43 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.262751 | time=12.3s | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 496789 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 62811117 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 62811117 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.00 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.67 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1778114 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 1705 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 1163 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.03 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.63 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.93 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.53 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=055 | loss=0.262654 | time=21.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2627 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_13 Low Diligence Reputation Model elapsed time: 89.30 secs (1.49 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_13 | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 274 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.91 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.64 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 18296 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 3.38 secs (0.06 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 496867 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 62763814 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 62763814 | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 4.34 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 135 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 2.12 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382560, Notes: 1226896 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 43.421931443251914 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 139.25709431200335 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 43.473940638371324 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12321729958057404 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09819003939628601 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.109748 | time=65.8s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 40.61 secs (0.68 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 496789, Notes: 1321133 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_11 Final compute scored notes elapsed time: 74.75 secs (1.25 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_11 | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.54337148492998 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 126.43419439641377 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 47.6085012355786 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12263049185276031 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09743598848581314 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 496867, Notes: 1321116 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.50817793441303 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 126.31914375476737 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 47.57325521545964 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12266939133405685 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09747980535030365 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10959278047084808 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08253980427980423 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.108285 | time=131.5s | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 74.88 secs (1.25 mins) | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 8725 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Postprocess output elapsed time: 68.04 secs (1.13 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11050929874181747 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08343624323606491 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10891925543546677 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08189026266336441 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11047868430614471 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08340329676866531 | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_10 run_scorer_parallelizable: Loading data elapsed time: 29.71 secs (0.50 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_10 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_10. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.108230 | time=198.1s | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.matrix_factorization:Num epochs: 59 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 66.12 secs (1.10 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 59 0.10882849991321564 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08167941868305206 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16981279850006104 | |
INFO:birdwatch.scorer:MFCoreScorer Final helpfulness-filtered MF elapsed time: 214.65 secs (3.58 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.93 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.05 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.19 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.17 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.85 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1761688 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 109750 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10988125205039978 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08280817419290543 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 55861 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 4.55 secs (0.08 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 5.37 secs (0.09 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.19 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.93 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.27 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.94 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 1008102 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filter input elapsed time: 43.35 secs (0.72 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 517761, Num Unique Notes Rated: 45869, Num Unique Raters: 9603 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Prepare ratings elapsed time: 0.33 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 4807 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 31122 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5471 | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 12312 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.44 secs (0.04 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4807 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 352239 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 352239 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4807, Notes: 45758 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.697867039643341 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 73.27626378198461 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 7.839522009181745 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15036438405513763 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1034027710556984 | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.11 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09833598136901855 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06338489800691605 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09538590162992477 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06136326491832733 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09507506340742111 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061440546065568924 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09503751993179321 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06147640198469162 | |
INFO:birdwatch.matrix_factorization:Num epochs: 81 | |
INFO:birdwatch.matrix_factorization:epoch 81 0.09503751993179321 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06147640198469162 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1765308380126953 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Final helpfulness-filtered MF elapsed time: 2.55 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_10 final scoring, about to call diligence with 352239 final round ratings. | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 109589 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.25 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1653111205429403666 1.751741 -0.175008 | |
1 1661796202554294297 -0.418910 3.891872 | |
2 1715444846586929540 -1.264926 -0.703209 | |
3 1738503882395844941 -6.175349 -0.802291 | |
4 1738528131655323997 -3.381059 -0.230357 | |
... ... ... ... | |
44431 1781302829556130237 0.454817 -1.483116 | |
44432 1870681504772186619 1.066034 -2.238522 | |
44433 1830682877715247309 -0.247141 0.875170 | |
44434 1736435011405181005 -0.215457 -0.906303 | |
44435 1828123440026378402 -0.411297 -0.422157 | |
internalNoteInterceptRound2 | |
0 1.751741 | |
1 -0.418910 | |
2 -1.264926 | |
3 -6.175349 | |
4 -3.381059 | |
... ... | |
44431 0.454817 | |
44432 1.066034 | |
44433 -0.247141 | |
44434 -0.215457 | |
44435 -0.411297 | |
[44436 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
3 00037E5A04D7781E19E5AAF559E14512FF17E7F76C30AF... | |
4 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
... ... | |
31117 FFE9E0E39C0049AD113CEF0AB5178393F13B15C4E7B31C... | |
31118 FFF104BC8D2B5E53432FF3E605B5D5D76EDECE29AFA0F5... | |
31119 FFF1316D167C80F6D36C904E952D720D8E8DAE052288D1... | |
31120 FFF5A46494A3BDEC6FFF8A38A777E53484648B186FCD76... | |
31121 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
31117 0.275395 -1.040295 0.429025 | |
31118 0.059025 1.722504 0.328327 | |
31119 NaN NaN NaN | |
31120 NaN NaN NaN | |
31121 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
31117 0.275395 | |
31118 0.059025 | |
31119 NaN | |
31120 NaN | |
31121 NaN | |
[31122 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4807, vs. num we are initializing: 31122 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4807 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4807, vs. num we are initializing: 31122 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4807 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4807, vs. num we are initializing: 31122 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 4807 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 45758, vs. num we are initializing: 44436 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 44532 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1226 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 45758, vs. num we are initializing: 44436 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 44532 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1226 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.466931 | time=0.0s | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 2.96 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.657639 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.617040 | time=0.8s | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 338 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.610772 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.609813 | time=1.6s | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.609548 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.609435 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=200 | loss=2.609392 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4807, vs. num we are initializing: 31122 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4807 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.590790 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.489094 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.487533 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=080 | loss=0.487471 | time=1.0s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4875 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_10 Low Diligence Reputation Model elapsed time: 4.40 secs (0.07 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_10 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.109850212931633 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08277617394924164 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.83 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.12 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.06 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.71 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1780829 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 706 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 407 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.35 secs (0.06 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 4.04 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.01 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.68 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float64 inputs=[dtype('float32'), dtype('float64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float64 inputs=[dtype('float64'), dtype('float32')] (allowed) | |
INFO:birdwatch.constants:Pseudoraters: prepare data elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'internalRaterIntercept': None, 'internalRaterFactor1': None, 'helpfulNum': None} | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382560, Notes: 1226896 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 146 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 3.29 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 4.00 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 8809 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 3.17 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.88 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12860751152038574 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10203398019075394 | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 58 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.89 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.49 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.108227 | time=266.3s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 40.80 secs (0.68 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_13 Final compute scored notes elapsed time: 240.83 secs (4.01 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_13 | |
INFO:birdwatch.matrix_factorization:Num epochs: 59 | |
INFO:birdwatch.matrix_factorization:epoch 59 0.10980002582073212 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08260970562696457 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17136713862419128 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Final helpfulness-filtered MF elapsed time: 254.85 secs (4.25 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionPlusScorer final scoring, about to call diligence with 62811117 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=0.108227 | time=288.7s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 599301, vs. num we are initializing: 599301 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 599301 | |
INFO:birdwatch.matrix_factorization:Num epochs: 59 | |
INFO:birdwatch.matrix_factorization:epoch 59 0.10976870357990265 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08257711678743362 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1713782697916031 | |
INFO:birdwatch.scorer:MFExpansionScorer Final helpfulness-filtered MF elapsed time: 255.27 secs (4.25 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionScorer final scoring, about to call diligence with 62763814 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.009235 | time=1.0s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 41.94 secs (0.70 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_10 Final compute scored notes elapsed time: 74.72 secs (1.25 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_10 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10987015813589096 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08298537880182266 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 116962 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Postprocess output elapsed time: 73.06 secs (1.22 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1715437541212520617 ... 1.927415 | |
1 1722158725807378589 ... -1.758197 | |
2 1724462438022554032 ... -0.870198 | |
3 1724471553906131352 ... -1.076510 | |
4 1733114336250380782 ... -1.699260 | |
... ... ... ... | |
1319215 1739514358324183339 ... -0.159068 | |
1319216 1876071888876966335 ... -0.228524 | |
1319217 1737522200616263780 ... 0.886244 | |
1319218 1872611622075723818 ... -0.243515 | |
1319219 1801082722632577292 ... -0.140561 | |
[1319220 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... -0.598722 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... ... -0.173721 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.653616 | |
3 00004D45B2AFE9EA96333B280009DCC621851088264E8F... ... NaN | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... -0.194291 | |
... ... ... ... | |
722507 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... 0.168351 | |
722508 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... ... -0.282261 | |
722509 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.057744 | |
722510 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... NaN | |
722511 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... -0.317479 | |
[722512 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496789, vs. num we are initializing: 722512 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.007130 | time=66.7s | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 496789 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496789, vs. num we are initializing: 722512 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 496789 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496789, vs. num we are initializing: 722512 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 496789 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1321133, vs. num we are initializing: 1319220 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1262946 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 58187 | |
INFO:birdwatch.scorer: Final noteScores length: 5227 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1321133, vs. num we are initializing: 1319220 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Postprocess output elapsed time: 59.62 secs (0.99 mins) | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1262946 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 58187 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.964743 | time=0.9s | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1715437541212520617 ... 1.916127 | |
1 1722158725807378589 ... -1.758119 | |
2 1724462438022554032 ... -0.910524 | |
3 1724471553906131352 ... -1.044411 | |
4 1733114336250380782 ... -1.693384 | |
... ... ... ... | |
1319198 1739514358324183339 ... -0.160440 | |
1319199 1876071888876966335 ... -0.227913 | |
1319200 1737522200616263780 ... 0.888188 | |
1319201 1872611622075723818 ... -0.243491 | |
1319202 1801082722632577292 ... -0.141208 | |
[1319203 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... -0.599154 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... ... -0.173048 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.657185 | |
3 00004D45B2AFE9EA96333B280009DCC621851088264E8F... ... NaN | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... -0.187659 | |
... ... ... ... | |
722839 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... 0.169876 | |
722840 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... ... -0.283637 | |
722841 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.057119 | |
722842 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... NaN | |
722843 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... -0.319418 | |
[722844 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496867, vs. num we are initializing: 722844 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 496867 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496867, vs. num we are initializing: 722844 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 496867 | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_9 run_scorer_parallelizable: Loading data elapsed time: 29.30 secs (0.49 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_9 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496867, vs. num we are initializing: 722844 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_9. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 496867 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10894443839788437 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08187822252511978 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1321116, vs. num we are initializing: 1319203 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1262928 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 58188 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1321116, vs. num we are initializing: 1319203 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1262928 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 58188 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.967510 | time=1.0s | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_8 run_scorer_parallelizable: Loading data elapsed time: 29.58 secs (0.49 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_8 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_8. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.441855 | time=50.5s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.007113 | time=133.2s | |
INFO:birdwatch.scorer: Ratings after group filter: 5652192 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filter input elapsed time: 53.26 secs (0.89 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.441232 | time=49.8s | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 5005660, Num Unique Notes Rated: 161620, Num Unique Raters: 52729 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Prepare ratings elapsed time: 3.03 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10881444066762924 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08172127604484558 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 28554 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 90695 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 30820 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 28554 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 3198252 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3198252 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 28554, Notes: 161453 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.scorer: Ratings after group filter: 769482 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filter input elapsed time: 45.71 secs (0.76 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.80918285816925 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 112.00714435805841 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 19.923293240302538 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 294753, Num Unique Notes Rated: 35033, Num Unique Raters: 5227 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Prepare ratings elapsed time: 0.23 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 0.13769452273845673 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11265421658754349 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 2677 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22924 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 2951 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2677 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 230235 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 230235 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2677, Notes: 35021 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 6.574198338139973 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 86.00485618229361 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 6.69694819020582 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.152942955493927 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10540663450956345 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09669956564903259 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06090006232261658 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09332355856895447 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05876016616821289 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09292444586753845 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05885151028633118 | |
INFO:birdwatch.matrix_factorization:Num epochs: 79 | |
INFO:birdwatch.matrix_factorization:epoch 79 0.0928746834397316 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05890684574842453 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17319533228874207 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Final helpfulness-filtered MF elapsed time: 1.56 secs (0.03 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_8 final scoring, about to call diligence with 230235 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1711478693334266247 5.038938 -0.922702 | |
1 1713286202017607729 0.146843 2.817860 | |
2 1719803084530946467 3.737844 0.192020 | |
3 1720456228927606963 3.887522 -0.387876 | |
4 1727107233727799766 -2.098484 0.107306 | |
... ... ... ... | |
33811 1715792226893152494 -0.653895 0.622179 | |
33812 1715795041388515539 -0.653920 0.626903 | |
33813 1831661420343120067 1.035051 2.164517 | |
33814 1732927442673607038 -0.162533 0.746689 | |
33815 1748275272611254440 -0.196774 0.803683 | |
internalNoteInterceptRound2 | |
0 5.038938 | |
1 0.146843 | |
2 3.737844 | |
3 3.887522 | |
4 -2.098484 | |
... ... | |
33811 -0.653895 | |
33812 -0.653920 | |
33813 1.035051 | |
33814 -0.162533 | |
33815 -0.196774 | |
[33816 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 000332634A6A64C51BA706D66615B87D74D34B3465D3CD... | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
3 000A0CE0A7410288C107822B15D2B35C5E95715EA946E7... | |
4 00177CE102355982315EED42EADA601B04A6112E029004... | |
... ... | |
22919 FFE894CCE08EAD722CB39396FBE0AFC5E05C9C9B9E3721... | |
22920 FFEFEEF7E6B2DCB450856DBBB9F7EF303369C610B38A42... | |
22921 FFF32E6FDAD8CA20E1F78638046B1E3D95B838103AE629... | |
22922 FFF5A46494A3BDEC6FFF8A38A777E53484648B186FCD76... | |
22923 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 0.564228 -2.569979 0.426953 | |
4 NaN NaN NaN | |
... ... ... ... | |
22919 NaN NaN NaN | |
22920 NaN NaN NaN | |
22921 NaN NaN NaN | |
22922 NaN NaN NaN | |
22923 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 0.564228 | |
4 NaN | |
... ... | |
22919 NaN | |
22920 NaN | |
22921 NaN | |
22922 NaN | |
22923 NaN | |
[22924 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2677, vs. num we are initializing: 22924 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2677 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2677, vs. num we are initializing: 22924 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2677 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2677, vs. num we are initializing: 22924 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 2677 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 35021, vs. num we are initializing: 33816 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 33982 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1039 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 35021, vs. num we are initializing: 33816 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 33982 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1039 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=9.341843 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.773442 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.719257 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.709303 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.707230 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.706447 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.706050 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.705828 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.705698 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.705621 | time=2.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=275 | loss=2.705611 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2677, vs. num we are initializing: 22924 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2677 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.651682 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.509360 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.507832 | time=0.5s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10029968619346619 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07119875401258469 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=085 | loss=0.507755 | time=0.7s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.5078 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_8 Low Diligence Reputation Model elapsed time: 3.79 secs (0.06 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_8 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:Num epochs: 66 | |
INFO:birdwatch.matrix_factorization:epoch 66 0.10880712419748306 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08169487863779068 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09814633429050446 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06893766671419144 | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 198.61 secs (3.31 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.60 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.76 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09791436791419983 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06869280338287354 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 62 | |
INFO:birdwatch.matrix_factorization:epoch 62 0.09791429340839386 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06869802623987198 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.171379953622818 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Final helpfulness-filtered MF elapsed time: 19.79 secs (0.33 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_9 final scoring, about to call diligence with 3198252 final round ratings. | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1642506152822079490 -2.290910 1.168253 | |
1 1644889840532566017 0.475048 2.483786 | |
2 1644890766915796992 -0.993903 1.550431 | |
3 1649616502188912641 -1.142338 0.081854 | |
4 1649621727880839168 -1.210350 0.853288 | |
... ... ... ... | |
160372 1835100035366682908 -0.387279 -0.196521 | |
160373 1713350331864625656 1.318001 1.772401 | |
160374 1836274146642153547 -0.274244 -0.647504 | |
160375 1767562540320543193 -0.294664 -0.516813 | |
160376 1872611622075723818 -0.430664 -0.299931 | |
internalNoteInterceptRound2 | |
0 -2.290910 | |
1 0.475048 | |
2 -0.993903 | |
3 -1.142338 | |
4 -1.210350 | |
... ... | |
160372 -0.387279 | |
160373 1.318001 | |
160374 -0.274244 | |
160375 -0.294664 | |
160376 -0.430664 | |
[160377 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
3 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
4 0002D1E11A8EA1E4B25048FA9D117406CE9EB1D3143BC9... | |
... ... | |
90690 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
90691 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
90692 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... | |
90693 FFFEB3E291D915645E08FD13A9BFE66B5912FE45306D25... | |
90694 FFFF8C877BDC3CEFEFD0D4C5F0E8B4BE537F5023A1F31F... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 -0.908694 -1.348517 0.311867 | |
4 NaN NaN NaN | |
... ... ... ... | |
90690 0.138325 0.752118 0.559713 | |
90691 NaN NaN NaN | |
90692 NaN NaN NaN | |
90693 0.122773 -0.574076 0.159022 | |
90694 0.279652 -0.503781 0.443090 | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 -0.908694 | |
4 NaN | |
... ... | |
90690 0.138325 | |
90691 NaN | |
90692 NaN | |
90693 0.122773 | |
90694 0.279652 | |
[90695 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28554, vs. num we are initializing: 90695 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 28554 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28554, vs. num we are initializing: 90695 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 28554 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28554, vs. num we are initializing: 90695 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 28554 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.66 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 161453, vs. num we are initializing: 160377 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 157321 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 4132 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 161453, vs. num we are initializing: 160377 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 157321 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 4132 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=4.821860 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.421554 | time=99.9s | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1781655 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 222 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 164 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.31 secs (0.06 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.315525 | time=3.0s | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 4.01 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.94 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.13 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.287919 | time=5.9s | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.81 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.285700 | time=8.9s | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 60 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 3.17 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.94 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 382560, 'internalRaterIntercept': -0.44769228, 'internalRaterFactor1': -1.1232908, 'helpfulNum': 1.0} | |
INFO:birdwatch.reputation_matrix_factorization:epoch=115 | loss=2.285573 | time=11.3s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28554, vs. num we are initializing: 90695 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 28554 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.355772 | time=0.0s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382561, Notes: 1226896 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 6909 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 3.15 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.95 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.340458 | time=3.5s | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 86 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.420896 | time=98.6s | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.66 secs (0.03 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1583910435438156 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12053129076957703 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=055 | loss=0.340288 | time=6.3s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3403 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_9 Low Diligence Reputation Model elapsed time: 22.60 secs (0.38 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_9 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.007112 | time=201.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.007112 | time=201.8s | |
INFO:birdwatch.helpfulness_model:Helpfulness reputation loss: 0.0071 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/reputation_scorer.py, in _score_notes_and_users, at line 187: noteStats = noteStats.merge(noteStatusHistory[[c.noteIdKey]].drop_duplicates(), how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.scorer:Postprocessing output for ReputationScorer | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 10.36 secs (0.17 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 9.60 secs (0.16 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.86 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.420391 | time=147.6s | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.65 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1777437 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 12206 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 6395 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.26 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 4.00 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 41.58 secs (0.69 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.86 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_8 Final compute scored notes elapsed time: 73.67 secs (1.23 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_8 | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.65 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 2086 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.92 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.61 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.419713 | time=146.2s | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 31685 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 3.13 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.85 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 41 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.81 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.51 secs (0.03 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=2.420346 | time=171.2s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496789, vs. num we are initializing: 722512 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 496789 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.256877 | time=0.7s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.13233225047588348 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10279327630996704 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=2.419666 | time=168.6s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 496867, vs. num we are initializing: 722844 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 496867 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 1419547 | |
INFO:birdwatch.scorer:ReputationScorer Postprocess output elapsed time: 60.04 secs (1.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.256756 | time=0.7s | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 44.59 secs (0.74 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_9 Final compute scored notes elapsed time: 95.26 secs (1.59 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_9 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.250742 | time=46.9s | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_7 run_scorer_parallelizable: Loading data elapsed time: 29.96 secs (0.50 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_7 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_7. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 818 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Postprocess output elapsed time: 72.91 secs (1.22 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.250463 | time=45.8s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13123254477977753 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10156343877315521 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=045 | loss=0.250693 | time=70.5s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2507 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Low Diligence Reputation Model elapsed time: 335.86 secs (5.60 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFExpansionPlusScorer | |
INFO:birdwatch.reputation_matrix_factorization:epoch=045 | loss=0.250411 | time=68.5s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2504 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFExpansionScorer Low Diligence Reputation Model elapsed time: 332.26 secs (5.54 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFExpansionScorer | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_6 run_scorer_parallelizable: Loading data elapsed time: 28.64 secs (0.48 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_6 set to: 4 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_6. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.scorer: Ratings after group filter: 1789121 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filter input elapsed time: 44.24 secs (0.74 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1270048, Num Unique Notes Rated: 83141, Num Unique Raters: 29463 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Prepare ratings elapsed time: 0.63 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 12518 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 57414 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 14801 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 12518 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 889049 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 889049 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 12518, Notes: 83062 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.703438395415473 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 71.021648825691 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 10.875813615837979 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1511508822441101 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10870592296123505 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11111214011907578 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07926967740058899 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10906527191400528 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07741215825080872 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10881809890270233 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07724964618682861 | |
INFO:birdwatch.matrix_factorization:Num epochs: 75 | |
INFO:birdwatch.matrix_factorization:epoch 75 0.10878925025463104 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07723891735076904 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18187516927719116 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Final helpfulness-filtered MF elapsed time: 5.60 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_7 final scoring, about to call diligence with 889049 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1783356907102572640 ... 0.408297 | |
1 1817651698652926356 ... -1.424548 | |
2 1819455029834555469 ... -1.252842 | |
3 1819460608976118232 ... 2.326639 | |
4 1832658971833806987 ... 0.454524 | |
... ... ... ... | |
81268 1815846943219757100 ... -0.247081 | |
81269 1835317627482243467 ... -0.315699 | |
81270 1761711989527650551 ... -0.393296 | |
81271 1739385025395569106 ... 0.560137 | |
81272 1774629591619096646 ... -0.301313 | |
[81273 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... NaN | |
1 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... ... -0.027743 | |
2 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... ... NaN | |
3 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... ... -0.171895 | |
4 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... ... NaN | |
... ... ... ... | |
57409 FFF7636C99E1370B663778061CD0AF5458555FDA579F88... ... NaN | |
57410 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... ... NaN | |
57411 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... ... -0.155000 | |
57412 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... ... NaN | |
57413 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... ... NaN | |
[57414 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12518, vs. num we are initializing: 57414 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 12518 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12518, vs. num we are initializing: 57414 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 12518 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12518, vs. num we are initializing: 57414 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 12518 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 83062, vs. num we are initializing: 81273 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 81100 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1962 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 83062, vs. num we are initializing: 81273 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 81100 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1962 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.753833 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.885844 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.854939 | time=1.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.851777 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.851458 | time=3.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=140 | loss=2.851408 | time=3.7s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12518, vs. num we are initializing: 57414 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 12518 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.543244 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.476768 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.475740 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.475706 | time=1.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4757 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_7 Low Diligence Reputation Model elapsed time: 7.30 secs (0.12 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_7 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 48338 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Postprocess output elapsed time: 74.52 secs (1.24 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 4.07 secs (0.07 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.matrix_factorization:epoch 60 0.13106535375118256 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10125327855348587 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.57 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.62 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1778293 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 4697 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 2504 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.11 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.78 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.81 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.66 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 5575831 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filter input elapsed time: 47.55 secs (0.79 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 730 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.92 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.59 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 4810765, Num Unique Notes Rated: 214739, Num Unique Raters: 40001 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Prepare ratings elapsed time: 2.78 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 20069 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.89 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.54 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 98 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.87 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.50 secs (0.03 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_5 run_scorer_parallelizable: Loading data elapsed time: 27.43 secs (0.46 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_5 set to: 4 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 22064 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 96153 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 23491 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_5. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 22064 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 3035279 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3035279 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 22064, Notes: 213702 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.203325191154036 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 137.5670322697607 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 14.27394280656491 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12733842432498932 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09110958874225616 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09962290525436401 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06903161853551865 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09806198626756668 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06802837550640106 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09787201136350632 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06780334562063217 | |
INFO:birdwatch.matrix_factorization:Num epochs: 79 | |
INFO:birdwatch.matrix_factorization:epoch 79 0.09784702211618423 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06776903569698334 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17027948796749115 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Final helpfulness-filtered MF elapsed time: 21.53 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_6 final scoring, about to call diligence with 3035279 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1699159156060475887 0.080400 1.458658 | |
1 1708036258310607099 -1.405261 2.748523 | |
2 1708634843616248157 -1.419548 3.016051 | |
3 1708698407043252372 1.965857 0.520041 | |
4 1708722796358963422 -1.201522 3.183873 | |
... ... ... ... | |
212397 1749913089628152293 0.975825 1.941516 | |
212398 1773318217337049283 -0.334301 -0.556813 | |
212399 1703939069246402796 -0.199138 0.891681 | |
212400 1872283842301583792 1.159355 -2.065604 | |
212401 1872298791140782443 -0.342290 0.847621 | |
internalNoteInterceptRound2 | |
0 0.080400 | |
1 -1.405261 | |
2 -1.419548 | |
3 1.965857 | |
4 -1.201522 | |
... ... | |
212397 0.975825 | |
212398 -0.334301 | |
212399 -0.199138 | |
212400 1.159355 | |
212401 -0.342290 | |
[212402 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002188E5ED3028646C97CBE9ADCD12CB5B8BFAF8819BD... | |
3 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
4 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
... ... | |
96148 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
96149 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
96150 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... | |
96151 FFFF0C7BF4089C6436CAB332B309A1A81C21E11CD61CE4... | |
96152 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 -0.002118 -2.059390 0.186737 | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
96148 NaN NaN NaN | |
96149 NaN NaN NaN | |
96150 NaN NaN NaN | |
96151 NaN NaN NaN | |
96152 -0.152272 -0.288305 0.448480 | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 -0.002118 | |
3 NaN | |
4 NaN | |
... ... | |
96148 NaN | |
96149 NaN | |
96150 NaN | |
96151 NaN | |
96152 -0.152272 | |
[96153 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 22064, vs. num we are initializing: 96153 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 22064 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 22064, vs. num we are initializing: 96153 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 22064 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 22064, vs. num we are initializing: 96153 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 22064 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 213702, vs. num we are initializing: 212402 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 207350 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 6352 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 213702, vs. num we are initializing: 212402 | |
INFO:birdwatch.matrix_factorization:Num epochs: 79 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 207350 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 6352 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.528307 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 79 0.1310441642999649 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1012219786643982 | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 234.86 secs (3.91 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.599566 | time=2.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.567772 | time=5.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.564870 | time=7.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.564632 | time=10.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=2.564613 | time=11.4s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 22064, vs. num we are initializing: 96153 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 22064 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.430499 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.386934 | time=2.6s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 43.16 secs (0.72 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_7 Final compute scored notes elapsed time: 77.13 secs (1.29 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_7 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386325 | time=5.2s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.386306 | time=6.1s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3863 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_6 Low Diligence Reputation Model elapsed time: 22.18 secs (0.37 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_6 | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 382561, 'internalRaterIntercept': -0.44769228, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0} | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382561, Notes: 1226896 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.scorer: Ratings after group filter: 565869 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filter input elapsed time: 49.96 secs (0.83 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 256086, Num Unique Notes Rated: 23716, Num Unique Raters: 6216 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Prepare ratings elapsed time: 0.23 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3042 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 17268 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3510 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 3042 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 190605 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 190605 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3042, Notes: 23700 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.04240506329114 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 62.65779092702169 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 8.278428426216466 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15058447420597076 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10152675956487656 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09538181126117706 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05800676345825195 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09152333438396454 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05575576424598694 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09121275693178177 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05589892715215683 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09117168188095093 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05603545904159546 | |
INFO:birdwatch.matrix_factorization:Num epochs: 84 | |
INFO:birdwatch.matrix_factorization:epoch 84 0.09116873145103455 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05606779828667641 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18550153076648712 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Final helpfulness-filtered MF elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_5 final scoring, about to call diligence with 190605 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1685671586295099392 -0.198606 -2.415438 | |
1 1694265338060222629 -3.474766 0.750264 | |
2 1708971742826213665 -1.393753 0.436008 | |
3 1709029742886760650 -1.573781 0.471117 | |
4 1710469668035801230 -1.503085 0.295592 | |
... ... ... ... | |
22622 1796074984558710823 -0.346538 -0.960280 | |
22623 1853499093554770105 -0.346727 -0.694310 | |
22624 1779889399464935794 0.304722 -1.039934 | |
22625 1821313194553700446 0.219867 -1.038188 | |
22626 1821335118532731025 0.304476 -1.039811 | |
internalNoteInterceptRound2 | |
0 -0.198606 | |
1 -3.474766 | |
2 -1.393753 | |
3 -1.573781 | |
4 -1.503085 | |
... ... | |
22622 -0.346538 | |
22623 -0.346727 | |
22624 0.304722 | |
22625 0.219867 | |
22626 0.304476 | |
[22627 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
1 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
3 000F1687C56AB92D846F2B9BFA71AE16D8A88426754E3B... | |
4 0011AB5425173F62E5D4A1787E34ED324BDD5807D4C3B8... | |
... ... | |
17263 FFDC71F0AE061FDEC1E553DBEADDD7EFBD520C6EA87C6F... | |
17264 FFE87CF4860C52665B228E9F345BB3EE183994416FA6D7... | |
17265 FFEA6CF8956CF5972B2086A17F147FCC0B59CBD4CE0C7E... | |
17266 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... | |
17267 FFF6DBEDE9ED4DC6A61291E33742D1805155E385475E43... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
17263 NaN NaN NaN | |
17264 NaN NaN NaN | |
17265 NaN NaN NaN | |
17266 NaN NaN NaN | |
17267 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
17263 NaN | |
17264 NaN | |
17265 NaN | |
17266 NaN | |
17267 NaN | |
[17268 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 3042, vs. num we are initializing: 17268 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 3042 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 3042, vs. num we are initializing: 17268 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 3042 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 3042, vs. num we are initializing: 17268 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 3042 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 23700, vs. num we are initializing: 22627 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 23201 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 499 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 23700, vs. num we are initializing: 22627 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 23201 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 499 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.319346 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.599784 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.558967 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.553374 | time=0.7s | |
INFO:birdwatch.matrix_factorization:epoch 0 0.16021329164505005 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1222662404179573 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.552587 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.552363 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.552264 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=190 | loss=2.552243 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 3042, vs. num we are initializing: 17268 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 3042 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.591146 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.499323 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.497705 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.497654 | time=0.6s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4977 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_5 Low Diligence Reputation Model elapsed time: 2.38 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_5 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.52 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 8.85 secs (0.15 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.99 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.06 secs (0.02 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 7.39 secs (0.12 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.79 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1782230 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 283 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 183 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.27 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.92 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.64 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.02 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.72 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1771134 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 9549 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 5520 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.18 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.87 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 71 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 3.22 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.86 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.00 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.65 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 4847 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 3.06 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.78 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 1497 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.86 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.50 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 15 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.92 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.58 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 42091 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.85 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.54 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 96 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.85 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.50 secs (0.02 mins) | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 8848 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.13505569100379944 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10621832311153412 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Postprocess output elapsed time: 67.65 secs (1.13 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 43.74 secs (0.73 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.80 secs (0.66 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_5 Final compute scored notes elapsed time: 75.26 secs (1.25 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_5 | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_6 Final compute scored notes elapsed time: 85.28 secs (1.42 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_6 | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_4 run_scorer_parallelizable: Loading data elapsed time: 28.52 secs (0.48 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_4 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_4. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1340135782957077 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10468088090419769 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.scorer: Final noteScores length: 37140 | |
INFO:birdwatch.scorer: Ratings after group filter: 1940497 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filter input elapsed time: 48.52 secs (0.81 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1551079, Num Unique Notes Rated: 61887, Num Unique Raters: 19569 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Prepare ratings elapsed time: 0.80 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Postprocess output elapsed time: 60.28 secs (1.00 mins) | |
INFO:birdwatch.scorer: Original noteScores length: 1783629 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 10027 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 39083 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 10836 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 10027 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 985834 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 985834 | |
INFO:birdwatch.scorer: Final noteScores length: 4010 | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 10027, Notes: 61814 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.948393567800174 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 98.31794155779396 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 16.100692228386272 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1481984555721283 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12249139696359634 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Postprocess output elapsed time: 63.35 secs (1.06 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09776052832603455 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06773079186677933 | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 40 0.0944298654794693 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06429769098758698 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09417957067489624 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06392911076545715 | |
INFO:birdwatch.matrix_factorization:Num epochs: 72 | |
INFO:birdwatch.matrix_factorization:epoch 72 0.09415718913078308 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0638841763138771 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17005860805511475 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Final helpfulness-filtered MF elapsed time: 7.28 secs (0.12 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_4 final scoring, about to call diligence with 985834 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1682453260345528328 -2.110976 2.561753 | |
1 1708988303926862198 1.531360 0.462310 | |
2 1711418483634741262 -0.972946 1.663695 | |
3 1711435360398356934 -0.336547 1.692834 | |
4 1711499136887931319 -2.186830 0.124414 | |
... ... ... ... | |
61102 1820850280050618650 0.356350 1.048785 | |
61103 1820850814115557481 0.374662 1.046702 | |
61104 1806138066056404994 -0.530391 -0.791771 | |
61105 1790301132448833854 -0.180079 0.688214 | |
61106 1704681040659521639 -0.288420 -0.803593 | |
internalNoteInterceptRound2 | |
0 -2.110976 | |
1 1.531360 | |
2 -0.972946 | |
3 -0.336547 | |
4 -2.186830 | |
... ... | |
61102 0.356350 | |
61103 0.374662 | |
61104 -0.530391 | |
61105 -0.180079 | |
61106 -0.288420 | |
[61107 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
2 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
3 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
4 000C92F6B8127DF83BE8430A54BCA7ECF08071EC8E00E2... | |
... ... | |
39078 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... | |
39079 FFF6DBEDE9ED4DC6A61291E33742D1805155E385475E43... | |
39080 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... | |
39081 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... | |
39082 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
39078 NaN NaN NaN | |
39079 NaN NaN NaN | |
39080 0.281992 -0.450381 0.534745 | |
39081 NaN NaN NaN | |
39082 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
39078 NaN | |
39079 NaN | |
39080 0.281992 | |
39081 NaN | |
39082 NaN | |
[39083 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 10027, vs. num we are initializing: 39083 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 10027 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 10027, vs. num we are initializing: 39083 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 10027 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 10027, vs. num we are initializing: 39083 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 10027 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 61814, vs. num we are initializing: 61107 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 60608 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1206 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 61814, vs. num we are initializing: 61107 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 60608 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1206 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.008080 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.467012 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.439165 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.437090 | time=3.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=115 | loss=2.436965 | time=4.0s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 10027, vs. num we are initializing: 39083 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 10027 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.408247 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.387106 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386858 | time=1.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386858 | time=1.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3869 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_4 Low Diligence Reputation Model elapsed time: 7.50 secs (0.13 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_4 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 3.99 secs (0.07 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_3 run_scorer_parallelizable: Loading data elapsed time: 26.55 secs (0.44 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_3 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_3. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 4.06 secs (0.07 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_2 run_scorer_parallelizable: Loading data elapsed time: 26.67 secs (0.44 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_2 set to: 4 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_2. Original rating length: 120945188 | |
INFO:birdwatch.scorer: Ratings after topic filter: 120945188 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1338634043931961 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1044660210609436 | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.98 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.61 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1780422 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 3970 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 2115 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.07 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.74 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.94 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.56 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 561 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.87 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.48 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 12594 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.84 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.49 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 17 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.50 secs (0.02 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 70 | |
INFO:birdwatch.matrix_factorization:epoch 70 0.1338496208190918 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1044570803642273 | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 202.19 secs (3.37 mins) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 6272990 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filter input elapsed time: 45.33 secs (0.76 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1508019 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Filter input elapsed time: 43.50 secs (0.73 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 934408, Num Unique Notes Rated: 71120, Num Unique Raters: 16159 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Prepare ratings elapsed time: 0.50 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 382562, 'internalRaterIntercept': -0.44769228, 'internalRaterFactor1': 0.98411465, 'helpfulNum': 1.0} | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 7425 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 42685 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 8606 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7425 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 668701 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 668701 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 382561, Notes: 1226896 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 5618874, Num Unique Notes Rated: 172529, Num Unique Raters: 68132 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Prepare ratings elapsed time: 2.84 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 7425, Notes: 71036 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 9.413550875612366 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 90.06074074074074 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 9.56399101071799 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15567269921302795 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11590553820133209 | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10539298504590988 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07302097231149673 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10218331217765808 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0704675167798996 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10186641663312912 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07033660262823105 | |
INFO:birdwatch.matrix_factorization:Num epochs: 62 | |
INFO:birdwatch.matrix_factorization:epoch 62 0.10186642408370972 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07036734372377396 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17347291111946106 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Final helpfulness-filtered MF elapsed time: 3.50 secs (0.06 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_2 final scoring, about to call diligence with 668701 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment