Created
January 8, 2025 22:07
-
-
Save tuler/02aa42c423e5a627a0ea5fa5b9381f7b to your computer and use it in GitHub Desktop.
community notes execution log
This file has been truncated, but you can view the full file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python3 main.py -e ../input/userEnrollment-00000.tsv -n ../input/notes-00000.tsv -r ../input/ratings -s ../input/noteStatusHistory-00000.tsv -o ../output --seed 0 --parallel | |
INFO:birdwatch.runner:scorer python version: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] | |
INFO:birdwatch.runner:scorer pandas version: 2.2.2 | |
INFO:birdwatch.runner:beginning scorer execution | |
INFO:birdwatch.process_data:Timestamp of latest rating in data: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.process_data:Timestamp of latest note in data: 2025-01-04 01:01:14.426000 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Output mismatch on createdAtMillis: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on createdAtMillis_notes: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_status_history:total notes added to noteStatusHistory: 58610 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 57: newNoteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/process_data.py, in _filter_misleading_notes, at line 270: ratings = ratings.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.process_data:Preprocess Data: Filter misleading notes, starting with 118977893 ratings on 1566003 notes | |
INFO:birdwatch.process_data: Keeping 85792913 ratings on 1051136 misleading notes | |
INFO:birdwatch.process_data: Keeping 8763256 ratings on 149656 deleted notes that were previously scored (in note status history) | |
INFO:birdwatch.process_data: Removing 58796 ratings on 2914 older notes that aren't deleted, but are not-misleading. | |
INFO:birdwatch.process_data: Removing 9540 ratings on 1128 notes that were deleted and not in note status history (e.g. old). | |
INFO:birdwatch.process_data:Num Ratings: 118909557, Num Unique Notes Rated: 1561961, Num Unique Raters: 1040729 | |
INFO:birdwatch.process_data:Called filter_input_data_for_testing. | |
Notes: 1554031, Ratings: 118909557. Max note createdAt: 2025-01-04 01:01:14.426000; Max rating createAt: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.process_data:After filtering notes and ratings after particular timestamp (=None). | |
Notes: 1554031, Ratings: 118909557. Max note createdAt: 2025-01-04 01:01:14.426000; Max rating createAt: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.process_data:After filtering ratings after first status (plus None hours) for notes created in last 14 days. | |
Notes: 1554031, Ratings: 118909557. Max note createdAt: 2025-01-04 01:01:14.426000; Max rating createAt: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.process_data:After filtering prescoring notes and ratings to simulate a delay of None hours: | |
Notes: 1554031, Ratings: 118909557. Max note createdAt: 2025-01-04 01:01:14.426000; Max rating createAt: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.constants:Compute pair counts dict elapsed time: 10826.08 secs (180.43 mins) | |
INFO:birdwatch.constants:Compute PMI and minSim elapsed time: 2308.48 secs (38.47 mins) | |
INFO:birdwatch.constants:Delete unneeded pairs from pairCountsDict elapsed time: 273.86 secs (4.56 mins) | |
INFO:birdwatch.constants:Aggregate into cliques by post selection similarity elapsed time: 13.96 secs (0.23 mins) | |
INFO:birdwatch.constants:Compute Post Selection Similarity elapsed time: 13622.11 secs (227.04 mins) | |
INFO:birdwatch.run_scoring:logging environment variables | |
INFO:birdwatch.run_scoring:notes total RAM: 122768821 bytes (0.123 GB) | |
column dtype RAM | |
0 noteId int64 12432248 | |
1 noteAuthorParticipantId object 12432248 | |
2 createdAtMillis int64 12432248 | |
3 tweetId object 12432248 | |
4 classification object 12432248 | |
5 believable category 1554155 | |
6 harmful category 1554155 | |
7 validationDifficulty category 1554155 | |
8 misleadingOther Int8 3108062 | |
9 misleadingFactualError Int8 3108062 | |
10 misleadingManipulatedMedia Int8 3108062 | |
11 misleadingOutdatedInformation Int8 3108062 | |
12 misleadingMissingImportantContext Int8 3108062 | |
13 misleadingUnverifiedClaimAsFact Int8 3108062 | |
14 misleadingSatire Int8 3108062 | |
15 notMisleadingOther Int8 3108062 | |
16 notMisleadingFactuallyCorrect Int8 3108062 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3108062 | |
18 notMisleadingClearlySatire Int8 3108062 | |
19 notMisleadingPersonalOpinion Int8 3108062 | |
20 trustworthySources Int8 3108062 | |
21 summary object 12432248 | |
22 isMediaNote Int8 3108062 | |
INFO:birdwatch.run_scoring:ratings total RAM: 11296408047 bytes (11.296 GB) | |
column dtype RAM | |
0 noteId int64 951276456 | |
1 raterParticipantId object 951276456 | |
2 createdAtMillis int64 951276456 | |
3 version Int8 237819114 | |
4 agree Int8 237819114 | |
5 disagree Int8 237819114 | |
6 helpful Int8 237819114 | |
7 notHelpful Int8 237819114 | |
8 helpfulnessLevel category 118909689 | |
9 helpfulOther Int8 237819114 | |
10 helpfulInformative Int8 237819114 | |
11 helpfulClear Int8 237819114 | |
12 helpfulEmpathetic Int8 237819114 | |
13 helpfulGoodSources Int8 237819114 | |
14 helpfulUniqueContext Int8 237819114 | |
15 helpfulAddressesClaim Int8 237819114 | |
16 helpfulImportantContext Int8 237819114 | |
17 helpfulUnbiasedLanguage Int8 237819114 | |
18 notHelpfulOther Int8 237819114 | |
19 notHelpfulIncorrect Int8 237819114 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 237819114 | |
21 notHelpfulOpinionSpeculationOrBias Int8 237819114 | |
22 notHelpfulMissingKeyPoints Int8 237819114 | |
23 notHelpfulOutdated Int8 237819114 | |
24 notHelpfulHardToUnderstand Int8 237819114 | |
25 notHelpfulArgumentativeOrBiased Int8 237819114 | |
26 notHelpfulOffTopic Int8 237819114 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 237819114 | |
28 notHelpfulIrrelevantSources Int8 237819114 | |
29 notHelpfulOpinionSpeculation Int8 237819114 | |
30 notHelpfulNoteNotNeeded Int8 237819114 | |
31 ratedOnTweetId int64 951276456 | |
32 helpfulNum float64 951276456 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 225817062 bytes (0.226 GB) | |
column dtype RAM | |
0 noteId int64 14004048 | |
1 noteAuthorParticipantId object 14004048 | |
2 createdAtMillis float64 14004048 | |
3 timestampMillisOfFirstNonNMRStatus float64 14004048 | |
4 firstNonNMRStatus category 1750630 | |
5 timestampMillisOfCurrentStatus float64 14004048 | |
6 currentStatus category 1750638 | |
7 timestampMillisOfLatestNonNMRStatus float64 14004048 | |
8 mostRecentNonNMRStatus category 1750630 | |
9 timestampMillisOfStatusLock float64 14004048 | |
10 lockedStatus category 1750638 | |
11 timestampMillisOfRetroLock float64 14004048 | |
12 currentCoreStatus category 1750638 | |
13 currentExpansionStatus category 1750638 | |
14 currentGroupStatus category 1750638 | |
15 currentDecidedBy category 1751254 | |
16 currentModelingGroup float64 14004048 | |
17 timestampMillisOfMostRecentStatusChange float64 14004048 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14004048 | |
19 currentMultiGroupStatus category 1750638 | |
20 currentModelingMultiGroup float64 14004048 | |
21 timestampMinuteOfFinalScoringOutput float64 14004048 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14004048 | |
23 classification object 14004048 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 59362560 bytes (0.059 GB) | |
column dtype RAM | |
0 participantId object 8331584 | |
1 enrollmentState object 8331584 | |
2 successfulRatingNeededToEarnIn int64 8331584 | |
3 timestampOfLastStateChange int64 8331584 | |
4 timestampOfLastEarnOut float64 8331584 | |
5 modelingPopulation category 1041472 | |
6 modelingGroup float64 8331584 | |
7 numberOfTimesEarnedOut int64 8331584 | |
INFO:birdwatch.constants:Logging Prescoring Inputs Initial RAM usage elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.constants:Get Note Topics: Prepare Post Text elapsed time: 16.54 secs (0.28 mins) | |
INFO:birdwatch.topic_model: Notes unassigned due to multiple matches: 1736 | |
INFO:birdwatch.constants:Get Note Topics: Make Seed Labels elapsed time: 83.04 secs (1.38 mins) | |
INFO:birdwatch.topic_model: Initial vocabulary length: 2175468 | |
INFO:birdwatch.topic_model: Total tokens to filter: 13 | |
INFO:birdwatch.topic_model: Total identified stopwords: 1704 | |
INFO:birdwatch.constants:Get Note Topics: Get Stop Words elapsed time: 86.10 secs (1.43 mins) | |
INFO:birdwatch.constants:Get Note Topics: Train Model elapsed time: 349.64 secs (5.83 mins) | |
INFO:birdwatch.topic_model:Assigning notes to topics: | |
INFO:birdwatch.constants:Get Note Topics: Predict elapsed time: 80.17 secs (1.34 mins) | |
INFO:birdwatch.topic_model: Balanced accuracy on raw predictions: 0.7090347219209823 | |
INFO:birdwatch.topic_model: Post Topic assignment results: [888954 26545 54077 2347] | |
INFO:birdwatch.topic_model: Note Topic assignment results: | |
noteTopic | |
GazaConflict 112059 | |
UkraineConflict 45446 | |
MessiRonaldo 4027 | |
Name: count, dtype: int64 | |
INFO:birdwatch.constants:Get Note Topics: Merge and assign predictions elapsed time: 1.66 secs (0.03 mins) | |
INFO:birdwatch.constants:Note Topic Assignment elapsed time: 633.41 secs (10.56 mins) | |
INFO:birdwatch.run_scoring:ratings summary before PSS: fac11c8135957e8df3a12e4196a84e59731b1af5052dd2720aadf606a52da80d | |
INFO:birdwatch.run_scoring:Post Selection Similarity Prescoring: begin with 118909557 ratings. | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py, in filter_ratings_by_post_selection_similarity, at line 85: ratings.merge( | |
PandasTypeError: Output mismatch on postSelectionValue: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py, in filter_ratings_by_post_selection_similarity, at line 85: ratings.merge( | |
PandasTypeError: Input mismatch on postSelectionValue: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Output mismatch on postSelectionValue_note_author: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:111: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.sort_values( | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:114: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.drop_duplicates( | |
INFO:birdwatch.run_scoring:Post Selection Similarity Prescoring: 118317340 ratings remaining. | |
INFO:birdwatch.constants:Filter ratings by Post Selection Similarity elapsed time: 265.63 secs (4.43 mins) | |
INFO:birdwatch.run_scoring:ratings summary after PSS: c2198f60420b6359f2488810e7f6425a3f37c27712ac4c0b8d5b149dfbe06904 | |
INFO:birdwatch.run_scoring:Error converting user IDs to ints. IDs will remain as strings. ValueError("invalid literal for int() with base 10: 'F35972BBD2F99515FD974E9C7AFD899970F2E4A59115132FAD59EBCB74C0ABE6'") | |
INFO:birdwatch.run_scoring:notes total RAM: 122768821 bytes (0.123 GB) | |
column dtype RAM | |
0 noteId int64 12432248 | |
1 noteAuthorParticipantId object 12432248 | |
2 createdAtMillis int64 12432248 | |
3 tweetId object 12432248 | |
4 classification object 12432248 | |
5 believable category 1554155 | |
6 harmful category 1554155 | |
7 validationDifficulty category 1554155 | |
8 misleadingOther Int8 3108062 | |
9 misleadingFactualError Int8 3108062 | |
10 misleadingManipulatedMedia Int8 3108062 | |
11 misleadingOutdatedInformation Int8 3108062 | |
12 misleadingMissingImportantContext Int8 3108062 | |
13 misleadingUnverifiedClaimAsFact Int8 3108062 | |
14 misleadingSatire Int8 3108062 | |
15 notMisleadingOther Int8 3108062 | |
16 notMisleadingFactuallyCorrect Int8 3108062 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3108062 | |
18 notMisleadingClearlySatire Int8 3108062 | |
19 notMisleadingPersonalOpinion Int8 3108062 | |
20 trustworthySources Int8 3108062 | |
21 summary object 12432248 | |
22 isMediaNote Int8 3108062 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13133224872 bytes (13.133 GB) | |
column dtype RAM | |
0 noteId int64 946538720 | |
1 raterParticipantId object 946538720 | |
2 createdAtMillis int64 946538720 | |
3 version Int8 236634680 | |
4 agree Int8 236634680 | |
5 disagree Int8 236634680 | |
6 helpful Int8 236634680 | |
7 notHelpful Int8 236634680 | |
8 helpfulnessLevel category 118317472 | |
9 helpfulOther Int8 236634680 | |
10 helpfulInformative Int8 236634680 | |
11 helpfulClear Int8 236634680 | |
12 helpfulEmpathetic Int8 236634680 | |
13 helpfulGoodSources Int8 236634680 | |
14 helpfulUniqueContext Int8 236634680 | |
15 helpfulAddressesClaim Int8 236634680 | |
16 helpfulImportantContext Int8 236634680 | |
17 helpfulUnbiasedLanguage Int8 236634680 | |
18 notHelpfulOther Int8 236634680 | |
19 notHelpfulIncorrect Int8 236634680 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 236634680 | |
21 notHelpfulOpinionSpeculationOrBias Int8 236634680 | |
22 notHelpfulMissingKeyPoints Int8 236634680 | |
23 notHelpfulOutdated Int8 236634680 | |
24 notHelpfulHardToUnderstand Int8 236634680 | |
25 notHelpfulArgumentativeOrBiased Int8 236634680 | |
26 notHelpfulOffTopic Int8 236634680 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 236634680 | |
28 notHelpfulIrrelevantSources Int8 236634680 | |
29 notHelpfulOpinionSpeculation Int8 236634680 | |
30 notHelpfulNoteNotNeeded Int8 236634680 | |
31 ratedOnTweetId int64 946538720 | |
32 helpfulNum float64 946538720 | |
33 postSelectionValue float64 946538720 | |
34 postSelectionValue_note_author float64 946538720 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 225817062 bytes (0.226 GB) | |
column dtype RAM | |
0 noteId int64 14004048 | |
1 noteAuthorParticipantId object 14004048 | |
2 createdAtMillis float64 14004048 | |
3 timestampMillisOfFirstNonNMRStatus float64 14004048 | |
4 firstNonNMRStatus category 1750630 | |
5 timestampMillisOfCurrentStatus float64 14004048 | |
6 currentStatus category 1750638 | |
7 timestampMillisOfLatestNonNMRStatus float64 14004048 | |
8 mostRecentNonNMRStatus category 1750630 | |
9 timestampMillisOfStatusLock float64 14004048 | |
10 lockedStatus category 1750638 | |
11 timestampMillisOfRetroLock float64 14004048 | |
12 currentCoreStatus category 1750638 | |
13 currentExpansionStatus category 1750638 | |
14 currentGroupStatus category 1750638 | |
15 currentDecidedBy category 1751254 | |
16 currentModelingGroup float64 14004048 | |
17 timestampMillisOfMostRecentStatusChange float64 14004048 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14004048 | |
19 currentMultiGroupStatus category 1750638 | |
20 currentModelingMultiGroup float64 14004048 | |
21 timestampMinuteOfFinalScoringOutput float64 14004048 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14004048 | |
23 classification object 14004048 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 59362560 bytes (0.059 GB) | |
column dtype RAM | |
0 participantId object 8331584 | |
1 enrollmentState object 8331584 | |
2 successfulRatingNeededToEarnIn int64 8331584 | |
3 timestampOfLastStateChange int64 8331584 | |
4 timestampOfLastEarnOut float64 8331584 | |
5 modelingPopulation category 1041472 | |
6 modelingGroup float64 8331584 | |
7 numberOfTimesEarnedOut int64 8331584 | |
INFO:birdwatch.constants:Logging Prescoring Inputs RAM usage before _run_scorers elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:Starting parallel scorer execution with 23 scorers. | |
Patching pandas | |
Pairs dict used 42.949673056GB RAM at max | |
Pairs dict used 42.949673056GB RAM after deleted unneeded pairs | |
SHELL: /bin/bash | |
PWD: /home/ubuntu/communitynotes/sourcecode | |
LOGNAME: ubuntu | |
XDG_SESSION_TYPE: tty | |
MOTD_SHOWN: pam | |
HOME: /home/ubuntu | |
LANG: C.UTF-8 | |
LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: | |
VIRTUAL_ENV: /home/ubuntu/.env | |
SSH_CONNECTION: 71.168.238.143 59937 172.31.16.171 22 | |
LESSCLOSE: /usr/bin/lesspipe %s %s | |
XDG_SESSION_CLASS: user | |
TERM: xterm-256color | |
LESSOPEN: | /usr/bin/lesspipe %s | |
USER: ubuntu | |
SHLVL: 0 | |
XDG_SESSION_ID: 20 | |
VIRTUAL_ENV_PROMPT: (.env) | |
XDG_RUNTIME_DIR: /run/user/1000 | |
PS1: (.env) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ | |
SSH_CLIENT: 71.168.238.143 59937 22 | |
XDG_DATA_DIRS: /usr/local/share:/usr/share:/var/lib/snapd/desktop | |
PATH: /home/ubuntu/.env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin | |
DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus | |
SSH_TTY: /dev/pts/2 | |
OLDPWD: /home/ubuntu | |
_: /home/ubuntu/.env/bin/python3 | |
KMP_INIT_AT_FORK: FALSE | |
KMP_DUPLICATE_LIB_OK: True | |
[Pipeline] .... (step 1 of 3) Processing UnigramEncoder, total= 1.4min | |
[Pipeline] ............. (step 2 of 3) Processing tfidf, total= 2.0s | |
[Pipeline] ........ (step 3 of 3) Processing Classifier, total= 4.4min | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:ReputationScorer run_scorer_parallelizable: Loading data elapsed time: 22.60 secs (0.38 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for ReputationScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFCoreScorer run_scorer_parallelizable: Loading data elapsed time: 22.80 secs (0.38 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFCoreScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_12 run_scorer_parallelizable: Loading data elapsed time: 23.43 secs (0.39 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_12 set to: 4 | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionScorer run_scorer_parallelizable: Loading data elapsed time: 23.56 secs (0.39 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_13 run_scorer_parallelizable: Loading data elapsed time: 23.57 secs (0.39 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFExpansionScorer set to: 12 | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_13 set to: 8 | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionPlusScorer run_scorer_parallelizable: Loading data elapsed time: 24.02 secs (0.40 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFExpansionPlusScorer set to: 12 | |
INFO:birdwatch.scorer:Filtering ratings for ReputationScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFCoreScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_12. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_13. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionPlusScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 764628 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filter input elapsed time: 43.68 secs (0.73 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_12: 3c2ef571abd917eaaf4a18edcc7583077ee3ca715f9a585291e8c377e9613e07 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 437309, Num Unique Notes Rated: 31027, Num Unique Raters: 6608 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Prepare ratings elapsed time: 0.24 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_12: f3cba71e0c7584b6d90e534b4cedb593c2463cbb424140a5e6cdf3969541e16d | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_12: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_12: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6608, Notes: 31027 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.094466110162116 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 66.17872276029055 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.562259197235107 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.071473598480225 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.33518463373184204 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.266846626996994 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1381341516971588 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09837325662374496 | |
INFO:birdwatch.scorer: Ratings after group filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10990920662879944 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07621243596076965 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10537681728601456 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07265415787696838 | |
INFO:birdwatch.scorer: Ratings after group filter: 35062479 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filter input elapsed time: 50.81 secs (0.85 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1048186868429184 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07219789177179337 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10474533587694168 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07216211408376694 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filter input elapsed time: 52.66 secs (0.88 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10473541915416718 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07215089350938797 | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.10473485291004181 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07215223461389542 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18742765486240387 | |
INFO:birdwatch.scorer:MFGroupScorer_12 First MF/stable init elapsed time: 7.58 secs (0.13 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_12 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 118315149 | |
INFO:birdwatch.scorer: Ratings after group filter: 102142736 | |
INFO:birdwatch.scorer:MFExpansionScorer Filter input elapsed time: 64.19 secs (1.07 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scorer:ReputationScorer Filter input elapsed time: 65.57 secs (1.09 mins) | |
INFO:birdwatch.reputation_scorer:seeding with 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 102142736 | |
INFO:birdwatch.scorer:MFCoreScorer Filter input elapsed time: 67.26 secs (1.12 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_13: e95c5625a0d4dd968718afbe26d9d681556853401ec0b51633dc280c165ab7af | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 32.07 secs (0.53 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Compute scored notes elapsed time: 39.03 secs (0.65 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 764508 post-tombstones and 120 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 605526, including 605526 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 41228 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Compute valid ratings elapsed time: 1.01 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Helpfulness scores pre-harassment elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6608 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22716 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5863 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 5171 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 437309 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 370794 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filtering by helpfulness score elapsed time: 0.50 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 243971 | |
1 15588 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 111235 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 217290, Num Unique Notes Rated: 17109, Num Unique Raters: 4533 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 206721 | |
1 10569 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04864006627088223 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 19.559182514902073 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4533, Notes: 17109 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 12.700333157987025 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 47.93514228987426 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.3938112258911133 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4814504384994507 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7128185033798218 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.37842825055122375 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4487408995628357 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26448583602905273 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4117359519004822 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.25155332684516907 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.406838059425354 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.24967393279075623 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4058714509010315 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2489582747220993 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.4058714509010315 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2489582747220993 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.3002163767814636 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Harassment tag consensus elapsed time: 2.76 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Helpfulness scores post-harassment elapsed time: 0.17 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6608 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22716 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5539 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4847 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 437309 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 317516 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4847, Notes: 31015 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.237497984846042 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 65.50773674437797 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3754280209541321 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30170732736587524 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10186462104320526 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06613680720329285 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10004783421754837 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06793586164712906 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09867382049560547 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06488165259361267 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09862621873617172 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06506084650754929 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09860589355230331 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06476974487304688 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09860695898532867 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06493175774812698 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18789513409137726 | |
INFO:birdwatch.constants:Final round MF elapsed time: 4.18 secs (0.07 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_12 prescoring, about to call diligence with 317516 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 001041D12A03F39CCB40BEA9458C469323254EEC76348B... -0.149588 | |
1 002A62303516D0CCE7BCBD143AE53FACB0FE03168AEA4E... 0.191009 | |
2 0037306269989273D720BBD181462AC844B31CB9003939... -0.305849 | |
3 0049F294210C39AE0E4AECF5FC2AC7FC51B7E09B968CC3... 0.091642 | |
4 00661AF4F42FD3F9F04048E1F668A3ADB341546490E117... 0.051435 | |
... ... ... | |
4842 FFA492BC3E2F5B0DF00DC824605BC9FA92EB3DB63A4042... -0.449168 | |
4843 FFA9BCEF8D874B50FCC1914BB47BE36B2BCAD5EC1396CD... 0.438851 | |
4844 FFB689E24DF9F3E4E9DB93A95E13168392B1382A78C446... 0.079098 | |
4845 FFC75BB262A6BBDC07F13902786D170008F7DC3D11B4DC... -0.619702 | |
4846 FFEEE02BCED1134EB1C57875779C03F2135B72BB4C8E7F... 0.535221 | |
[4847 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4847, vs. num we are initializing: 4847 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4847 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.042370 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.444522 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.983139 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.897624 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.868616 | time=3.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.854172 | time=3.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.845141 | time=4.4s | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 34070222, Num Unique Notes Rated: 608321, Num Unique Raters: 160470 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Prepare ratings elapsed time: 18.21 secs (0.30 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.839076 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.834849 | time=5.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.831749 | time=6.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.829500 | time=7.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2601, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.829440 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.751500 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.740840 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.740027 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.739998 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.546930 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.473663 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.472706 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.472668 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.1786, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.8295, 1.7400, 0.4727 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Low Diligence MF elapsed time: 11.12 secs (0.19 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_13: 0140f015592da55bc81d7c91f7870e8d00b1100bc24e2a8e89521269b1a2f48f | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_13: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_13: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 100691291, Num Unique Notes Rated: 1205894, Num Unique Raters: 590255 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 32.32 secs (0.54 mins) | |
INFO:birdwatch.constants:MFGroupScorer_12: Compute tag thresholds for percentiles elapsed time: 0.63 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 160470, Notes: 608321 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 56.00697986753704 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 212.3152115660248 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.642144203186035 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.204066276550293 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFExpansionPlusScorer: c2198f60420b6359f2488810e7f6425a3f37c27712ac4c0b8d5b149dfbe06904 | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_11 run_scorer_parallelizable: Loading data elapsed time: 19.16 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_11 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_11. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFCoreScorer: 6903bc703edd6c4c594895c5b01a3b6e0e242cd09bc55a62a94781769fc852f5 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFExpansionScorer: 1e064ba29cbf395160cd1baa4b27703664af00c10137b3543593690e6ccbc15f | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1735379 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filter input elapsed time: 45.03 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_11: e332ea9e9ba4960426c29a941bcc4e795ebd4179b79c14b6d35e01cf706ec47c | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1102007, Num Unique Notes Rated: 91934, Num Unique Raters: 8768 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Prepare ratings elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3273569345474243 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28068506717681885 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_11: 65988814905281790c094924499778fee281743f74975cd87721713e1101b77d | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_11: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_11: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 8768, Notes: 91934 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.986936280375051 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.6851049270073 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.546855449676514 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.067481517791748 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.30668893456459045 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.23621338605880737 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15802520513534546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12420892715454102 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11111921072006226 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07841742783784866 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10464991629123688 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0745537057518959 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10383251309394836 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07395748794078827 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10372518002986908 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07388006150722504 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10371161997318268 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0738646388053894 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.10371063649654388 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07386238873004913 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1659136414527893 | |
INFO:birdwatch.scorer:MFGroupScorer_11 First MF/stable init elapsed time: 17.04 secs (0.28 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_11 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 100691291, Num Unique Notes Rated: 1205894, Num Unique Raters: 590255 | |
INFO:birdwatch.scorer:MFCoreScorer Prepare ratings elapsed time: 62.12 secs (1.04 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 149458, Notes: 123028 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 67.27695321390253 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 55.37976555286435 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 116569833, Num Unique Notes Rated: 1296974, Num Unique Raters: 747994 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.619696140289307 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.176379203796387 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare ratings elapsed time: 71.17 secs (1.19 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.30686208605766296 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.255319207906723 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 116567689, Num Unique Notes Rated: 1296971, Num Unique Raters: 747974 | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare ratings elapsed time: 68.86 secs (1.15 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11687351018190384 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08992145210504532 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09446366131305695 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07093754410743713 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.18463000655174255 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.15509241819381714 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09115439653396606 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06826052069664001 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.39 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Compute scored notes elapsed time: 41.37 secs (0.69 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1734848 post-tombstones and 531 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1415596, including 1415593 post-tombstones and 3 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 85294 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Compute valid ratings elapsed time: 2.04 secs (0.03 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Helpfulness scores pre-harassment elapsed time: 0.25 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 8768 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 48937 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7697 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7119 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1102007 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 923054 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filtering by helpfulness score elapsed time: 1.34 secs (0.02 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 554112 | |
1 32985 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 335957 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 456359, Num Unique Notes Rated: 43168, Num Unique Raters: 6330 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 436078 | |
1 20281 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04444088973812284 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 21.501799714018045 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6330, Notes: 43168 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.571696627131209 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.09462875197472 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.248602867126465 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.3487613201141357 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09072532504796982 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06801789253950119 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6236096024513245 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2766990661621094 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.3938358724117279 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22185608744621277 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.362994909286499 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21204563975334167 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.35855334997177124 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2106824815273285 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3579177260398865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21060511469841003 | |
INFO:birdwatch.matrix_factorization:Num epochs: 104 | |
INFO:birdwatch.matrix_factorization:epoch 104 0.35793840885162354 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21066758036613464 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2972601056098938 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Harassment tag consensus elapsed time: 5.60 secs (0.09 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Helpfulness scores post-harassment elapsed time: 0.33 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 8768 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 48937 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7130 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 6552 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1102007 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 728170 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6552, Notes: 91737 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.9375824367485315 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 111.13705738705738 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38524729013442993 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31691431999206543 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1026439443230629 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06895100325345993 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09067422151565552 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06799975037574768 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09979289770126343 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06984815001487732 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09856418520212173 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06731922924518585 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.0985039696097374 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0672340840101242 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0984891727566719 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06704679876565933 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09848850965499878 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06718754023313522 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1667405664920807 | |
INFO:birdwatch.constants:Final round MF elapsed time: 9.40 secs (0.16 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_11 prescoring, about to call diligence with 728170 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00055253971F408A7AB80D461A543E010EC67DFAF29C45... -0.700129 | |
1 0007EFAB89EB0BCC18E8994B141F291F33C9CB80B9332E... 0.254400 | |
2 000E374F324AEBE8A92439EEC0C3DDE191F293CEF88509... 0.248246 | |
3 001496B1846E8D6B3857F889E75BE6CCB011824EFE36A0... -0.406397 | |
4 0026B9EAF060D14AFF58688B43EC51C5D2D92444A05DB8... -0.617673 | |
... ... ... | |
6547 FFBD7465A1175CF9CC7D37B2DB9689BA6469FD38417350... -0.028069 | |
6548 FFC1E16D320BD9589C96893BD161C6F9FDE5FC3C7C2D8E... -0.522187 | |
6549 FFC83F58410624DF16CD78060076B6070F13ACA978E417... -0.307979 | |
6550 FFE852866BE827C0D92EAC6FC2A68007E79120FD605090... -0.423836 | |
6551 FFFA49720F254411E1F79CA757C403F0A0217240BC4922... 0.454771 | |
[6552 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6552, vs. num we are initializing: 6552 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 6552 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.903498 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.327427 | time=1.7s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09066678583621979 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06797289103269577 | |
INFO:birdwatch.matrix_factorization:Num epochs: 141 | |
INFO:birdwatch.matrix_factorization:epoch 141 0.09066678583621979 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06797289103269577 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13837778568267822 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.885503 | time=3.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.796915 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.764297 | time=6.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.746885 | time=8.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.736106 | time=10.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.728972 | time=11.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.724138 | time=13.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.720786 | time=15.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.718410 | time=16.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.2027, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.718344 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.604440 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.594247 | time=3.3s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11632546782493591 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09168201684951782 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.593413 | time=5.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.593413 | time=5.0s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.515919 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.455283 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.454486 | time=1.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.454454 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6333, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.7184, 1.5934, 0.4545 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Low Diligence MF elapsed time: 25.07 secs (0.42 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.57 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.54 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFCoreScorer: 099c75f5676634aebbee9d3781661451c5e2fe162a1d5e3752616c832ae8b31d | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFCoreScorer: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFCoreScorer: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 32.43 secs (0.54 mins) | |
INFO:birdwatch.constants:MFGroupScorer_11: Compute tag thresholds for percentiles elapsed time: 1.47 secs (0.02 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10604341328144073 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08182567358016968 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFExpansionScorer: 27d0f80081eb899c9a52d60cd32b94df11ca3859b98459f0d861f853f8d75b23 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFExpansionScorer: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFExpansionScorer: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFExpansionPlusScorer: fe89c853311746933ca2a395e837640fccdb18d4e9348788c9ed7867790705f2 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFExpansionPlusScorer: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFExpansionPlusScorer: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_10 run_scorer_parallelizable: Loading data elapsed time: 18.80 secs (0.31 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_10 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_10. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteFactor1 | |
0 1354933402240229380 -0.109901 | |
1 1357798998405447682 -0.502796 | |
2 1360871260054503427 0.302926 | |
3 1361842531655376899 0.321590 | |
4 1362121547511521284 0.340940 | |
... ... ... | |
123023 1513190235412353030 0.407456 | |
123024 1649508090755317761 0.389938 | |
123025 1819976965786669296 0.274859 | |
123026 1645870102506622976 -0.201437 | |
123027 1642576751309062145 -0.047146 | |
[123028 rows x 2 columns], | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... -0.760034 | |
1 9D41130B60D66BCC6FAA1115676546405A37F3BC90991F... -0.753539 | |
2 EBDCB80B1EC4A9FB51C8A562377D72F9569692DEFFC8BC... -0.778270 | |
3 E23374E04DD1B97ED5E4BE68F56CD25AE5DE53DD2A3541... -0.406109 | |
4 60D2AB8839D3EF47DD1C377DD8246EBA76ECB17DD65F13... -0.533711 | |
... ... ... | |
149453 7C60F353091E8F57A620BC71CF1B2A8C810EA76EC08066... 0.403700 | |
149454 52AA9044A0F07DAA38C17980B05939439F2D9997EC18B6... -0.601153 | |
149455 B52B43A6F65EEF7B6F4384E83140C4DF05B697E95F1C94... 0.007266 | |
149456 618D046E3843E3EEF9DF23CD573651832C4A09A79DB4FF... 0.717910 | |
149457 5E45E0D9B9BCB6DED42386DA77A71E6EC4C07578AA3158... 0.236773 | |
[149458 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 590255, vs. num we are initializing: 149458 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 440797 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 149458 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1205894, vs. num we are initializing: 123028 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1189336 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 16558 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.120153 | time=0.7s | |
INFO:birdwatch.scorer:MFCoreScorer Prepare data for stable initialization elapsed time: 59.04 secs (0.98 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 149458, Notes: 123028 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 67.27695321390253 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 55.37976555286435 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.619696140289307 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.176379203796387 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10486361384391785 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08084195852279663 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.30686208605766296 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.255319207906723 | |
INFO:birdwatch.scorer: Ratings after group filter: 993406 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filter input elapsed time: 43.48 secs (0.72 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_10: c2feb68644f0a2e54256799933e192d83362cac5abba0878e666f8736decee87 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 492516, Num Unique Notes Rated: 43889, Num Unique Raters: 6231 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Prepare ratings elapsed time: 0.28 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_10: a7060dd5ed6ed5d0e29ea38bbb76e1272b098b37b950938fa7d57000ceb7e1ef | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_10: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_10: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6231, Notes: 43889 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.221855134543963 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 79.04285026480501 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.554954528808594 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.067558765411377 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.352897971868515 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2666269838809967 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11687351018190384 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08992145210504532 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.16407400369644165 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12644898891448975 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11047092080116272 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0750611424446106 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10238513350486755 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07031723856925964 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10141704976558685 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0695534497499466 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10129078477621078 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06945496052503586 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10127484798431396 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06944364309310913 | |
INFO:birdwatch.matrix_factorization:Num epochs: 150 | |
INFO:birdwatch.matrix_factorization:epoch 150 0.10127347707748413 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06944052129983902 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1762676239013672 | |
INFO:birdwatch.scorer:MFGroupScorer_10 First MF/stable init elapsed time: 8.01 secs (0.13 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_10 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09446366131305695 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07093754410743713 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare data for stable initialization elapsed time: 68.47 secs (1.14 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09115439653396606 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06826052069664001 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 143275, Notes: 102079 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 62.60333663143252 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 44.60293840516489 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.619355201721191 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.175113677978516 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare data for stable initialization elapsed time: 70.43 secs (1.17 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 143274, Notes: 102077 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 62.60223164865739 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 44.60158856456859 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.61724853515625 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.172989845275879 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3509570062160492 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29648557305336 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09072532504796982 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06801789253950119 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.37635836005210876 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3036267161369324 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1297946572303772 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10162457078695297 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09067422151565552 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06799975037574768 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10471677780151367 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08084765076637268 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11769197881221771 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08740263432264328 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09579639136791229 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07143382728099823 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.63 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Compute scored notes elapsed time: 41.16 secs (0.69 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 993069 post-tombstones and 337 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 799077, including 799077 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 48495 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Compute valid ratings elapsed time: 1.20 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Helpfulness scores pre-harassment elapsed time: 0.17 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09446202218532562 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0713285282254219 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6231 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 30734 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5769 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 5114 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 492516 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 416619 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filtering by helpfulness score elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 265129 | |
1 15870 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 135620 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 212213, Num Unique Notes Rated: 19225, Num Unique Raters: 4391 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 203329 | |
1 8884 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04186359930824219 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 22.88710040522287 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4391, Notes: 19225 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 11.038387516254877 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 48.32908221361877 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.2986252307891846 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.391450047492981 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6214652061462402 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27479344606399536 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.40142756700515747 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2236347645521164 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.37024053931236267 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21267293393611908 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09066678583621979 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06797289103269577 | |
INFO:birdwatch.matrix_factorization:Num epochs: 141 | |
INFO:birdwatch.matrix_factorization:epoch 141 0.09066678583621979 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06797289103269577 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13837778568267822 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3657989501953125 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21165578067302704 | |
INFO:birdwatch.scorer:MFCoreScorer MF on stable-initialization subset elapsed time: 75.80 secs (1.26 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3651295304298401 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2116529941558838 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.3650487959384918 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2116595059633255 | |
INFO:birdwatch.matrix_factorization:Num epochs: 134 | |
INFO:birdwatch.matrix_factorization:epoch 134 0.3650420308113098 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21166589856147766 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.31395378708839417 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Harassment tag consensus elapsed time: 3.75 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Helpfulness scores post-harassment elapsed time: 0.22 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09116222709417343 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06830676645040512 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6231 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 30734 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5431 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4776 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 492516 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 343933 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4776, Notes: 43807 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.851096856666743 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.01277219430486 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.373954713344574 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30225273966789246 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09792043268680573 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06257890909910202 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09610128402709961 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06438472121953964 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09104418754577637 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06816454231739044 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09473082423210144 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06145293265581131 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09468600898981094 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06161431223154068 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09466652572154999 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06135568767786026 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.09466741979122162 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06150958314538002 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17643898725509644 | |
INFO:birdwatch.constants:Final round MF elapsed time: 4.69 secs (0.08 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_10 prescoring, about to call diligence with 343933 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000D424F8BBD591A0725D5F6F54F78C50C8DC591637C0E... 0.073382 | |
1 002ADDCBF2E4A2F363B766024F866D803ED65C8AF3759C... -0.510768 | |
2 0033B06B2B9E22875E057C84D99E2634127C4291A081B4... -0.320292 | |
3 003FDF9A655454DDED55D10DDC81830B57A59BEED1847D... 0.449039 | |
4 004FF8092304B71DF706338FA263DCACD3EE439A34C930... -0.642459 | |
... ... ... | |
4771 FFB14685679DE209BD2EB051060B796657AE6158314F58... -0.579395 | |
4772 FFC6993701C48435AB714C158FFD8420268574F35A55EE... -0.096785 | |
4773 FFC7B88FD9AA6574D525D426D7CE13466423DA88D27E19... -0.544985 | |
4774 FFE9E0E39C0049AD113CEF0AB5178393F13B15C4E7B31C... -0.108100 | |
4775 FFF104BC8D2B5E53432FF3E605B5D5D76EDECE29AFA0F5... 0.584097 | |
[4776 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4776, vs. num we are initializing: 4776 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4776 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.223921 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.269604 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.824924 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.731604 | time=2.7s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09058205783367157 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06791973859071732 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.697063 | time=3.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.678612 | time=4.4s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09056269377470016 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06790971755981445 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.666874 | time=5.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.658832 | time=6.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.653273 | time=7.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.649260 | time=7.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.646287 | time=8.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.2673, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.646210 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.550133 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.541141 | time=1.7s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09050930291414261 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06783805042505264 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.540339 | time=2.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.540313 | time=2.8s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.538402 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.475103 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.474280 | time=1.0s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09050682187080383 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06784248352050781 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.474247 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.4904, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6463, 1.5403, 0.4742 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Low Diligence MF elapsed time: 13.11 secs (0.22 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09049920737743378 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06784924864768982 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 144 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 144 0.09049887955188751 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06783787906169891 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.136198952794075 | |
INFO:birdwatch.scorer:MFExpansionScorer MF on stable-initialization subset elapsed time: 63.18 secs (1.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09049898386001587 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0678447037935257 | |
INFO:birdwatch.matrix_factorization:Num epochs: 142 | |
INFO:birdwatch.matrix_factorization:epoch 142 0.0904988944530487 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06784095615148544 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.13643474876880646 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer MF on stable-initialization subset elapsed time: 58.03 secs (0.97 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10469582676887512 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08087491244077682 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.016384 | time=139.1s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.73 secs (0.56 mins) | |
INFO:birdwatch.constants:MFGroupScorer_10: Compute tag thresholds for percentiles elapsed time: 0.79 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 590255, Notes: 1205894 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 83.49928849467697 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 170.58947573506367 | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_9 run_scorer_parallelizable: Loading data elapsed time: 19.14 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_9 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_9. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.2504432797431946 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21773739159107208 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.1046687662601471 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08083144575357437 | |
INFO:birdwatch.matrix_factorization:Num epochs: 170 | |
INFO:birdwatch.matrix_factorization:epoch 170 0.10462913662195206 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08075783401727676 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.14974866807460785 | |
INFO:birdwatch.scorer:MFGroupScorer_13 First MF/stable init elapsed time: 460.58 secs (7.68 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_13 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 747974, Notes: 1296971 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 747994, Notes: 1296974 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.matrix_factorization:initializing users | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py, in _initialize_parameters, at line 180: noteInit = self.noteIdMap.merge( | |
PandasTypeError: Output mismatch on noteIndex_y: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 89.87686617511109 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 155.84457347447906 | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 89.87831136167726 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 155.8432728070011 | |
INFO:birdwatch.scorer: Ratings after group filter: 5553700 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filter input elapsed time: 44.48 secs (0.74 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.24713310599327087 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21589761972427368 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.24708105623722076 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21586740016937256 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_9: ce8e64445ad8a33f65337cac310965b53c72e2e6aafb0d6d4180524a59252b2a | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4864106, Num Unique Notes Rated: 157547, Num Unique Raters: 40204 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Prepare ratings elapsed time: 2.38 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_9: 86e8c538ad84ecdc7b51d83ace7cd6f79b3ec79d0433a193438b0ff32b773345 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_9: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_9: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 40204, Notes: 157547 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 30.873999504909644 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 120.98562332106258 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.20347785949707 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.748543739318848 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.32802581787109375 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26613491773605347 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.123911552131176 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09102359414100647 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1044592410326004 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07506925612688065 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.015044 | time=275.7s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10081648826599121 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07260991632938385 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.17 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Compute scored notes elapsed time: 78.28 secs (1.30 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1004866287112236 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0723324567079544 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12169932574033737 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09570682793855667 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.1004253551363945 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07231764495372772 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10041990131139755 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07231421023607254 | |
INFO:birdwatch.note_ratings:Total ratings: 34848627 post-tombstones and 213852 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 28402760, including 28336976 post-tombstones and 65784 pre-tombstones. | |
INFO:birdwatch.matrix_factorization:Num epochs: 145 | |
INFO:birdwatch.matrix_factorization:epoch 145 0.10041949898004532 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0723150447010994 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1687842160463333 | |
INFO:birdwatch.scorer:MFGroupScorer_9 First MF/stable init elapsed time: 90.45 secs (1.51 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_9 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.note_ratings:Total valid ratings: 1378878 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Compute valid ratings elapsed time: 46.50 secs (0.78 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Helpfulness scores pre-harassment elapsed time: 1.92 secs (0.03 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12331057339906693 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09608878940343857 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12331555783748627 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09608767926692963 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.86 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Compute scored notes elapsed time: 45.71 secs (0.76 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 160470 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 222125 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 118794 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 5552738 post-tombstones and 962 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4419036, including 4419031 post-tombstones and 5 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 361283 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Compute valid ratings elapsed time: 6.52 secs (0.11 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Helpfulness scores pre-harassment elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 40204 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 89428 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 34424 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 32174 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4864106 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4112847 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filtering by helpfulness score elapsed time: 6.65 secs (0.11 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2702158 | |
1 184698 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1225991 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 112054 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 34070222 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 26309634 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filtering by helpfulness score elapsed time: 49.82 secs (0.83 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2733382, Num Unique Notes Rated: 103315, Num Unique Raters: 30776 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2571590 | |
1 161792 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05919114123090004 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 15.894419996044302 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 30776, Notes: 103315 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 26.456777815418864 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 88.81537561736418 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.364053726196289 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4735115766525269 | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 14773810 | |
1 1496958 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 9968271 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7234067916870117 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.42424485087394714 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 15844694, Num Unique Notes Rated: 450347, Num Unique Raters: 107088 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 14399887 | |
1 1444807 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4569048583507538 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2980141341686249 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.09118554135535846 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 9.966650909083357 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4193721413612366 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2851749360561371 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4144599139690399 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28278297185897827 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 107088, Notes: 450347 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 35.183300876879386 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 147.95956596444046 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.1848554611206055 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.3389382362365723 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.41376984119415283 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28245529532432556 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.41366851329803467 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28244003653526306 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11224709451198578 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08672188967466354 | |
INFO:birdwatch.matrix_factorization:Num epochs: 126 | |
INFO:birdwatch.matrix_factorization:epoch 126 0.4136626720428467 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28242769837379456 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.22359012067317963 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Harassment tag consensus elapsed time: 47.29 secs (0.79 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Helpfulness scores post-harassment elapsed time: 1.02 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.014982 | time=425.4s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 40204 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 89428 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 30605 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 28355 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4864106 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3153184 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 28355, Notes: 157405 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 20.032298846923542 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 111.20380885205431 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3978501260280609 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3303650915622711 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10245437920093536 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07018935680389404 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.682984471321106 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3960767686367035 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09849506616592407 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07023423910140991 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09785455465316772 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06930186599493027 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09767352044582367 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06839799880981445 | |
INFO:birdwatch.matrix_factorization:Num epochs: 82 | |
INFO:birdwatch.matrix_factorization:epoch 82 0.09767346829175949 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06861159205436707 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17108562588691711 | |
INFO:birdwatch.constants:Final round MF elapsed time: 42.30 secs (0.71 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_9 prescoring, about to call diligence with 3153184 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... -0.626822 | |
1 000415A1E3D1DA95BD626E1D938E4A9AFFB446D1A7D532... 0.670846 | |
2 00041B33023A7D5BCE252803A32E50E9AFCC1584F63ED4... -0.257271 | |
3 0005FD5ECF92B548D17E663347D5E696806076F75457A1... -0.361922 | |
4 000929DF3AFDB652A896FC0BA7FF91D9FBF4F3214D8392... -0.486622 | |
... ... ... | |
28350 FFF69B7E7ACFBB1E413F8B85384A9EB245A8D8B85F76C9... 0.006755 | |
28351 FFF771FF9CA763466ADA4DA853867E7371DEE6D71C50CB... -0.333397 | |
28352 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... 0.471107 | |
28353 FFFEB3E291D915645E08FD13A9BFE66B5912FE45306D25... -0.326330 | |
28354 FFFF8C877BDC3CEFEFD0D4C5F0E8B4BE537F5023A1F31F... -0.519399 | |
[28355 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28355, vs. num we are initializing: 28355 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 28355 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.572601 | time=0.1s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4453316330909729 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31214165687561035 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.543847 | time=8.2s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11341135948896408 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08803565800189972 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.131739 | time=16.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.070398 | time=24.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.054358 | time=31.7s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4118565618991852 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29874590039253235 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.047830 | time=39.9s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11341220140457153 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08803749829530716 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.044351 | time=49.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.042202 | time=58.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.040705 | time=67.6s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.40714937448501587 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29668211936950684 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.039624 | time=75.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.038874 | time=84.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7073, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.038853 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1109887957572937 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0859370306134224 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.861498 | time=7.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.850914 | time=14.8s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.40649089217185974 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29630595445632935 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.850127 | time=22.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.850102 | time=24.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.408697 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.336647 | time=4.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.335764 | time=9.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.014979 | time=589.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.335727 | time=11.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8712, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0389, 1.8501, 0.3357 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Low Diligence MF elapsed time: 123.59 secs (2.06 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.40638047456741333 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2962661385536194 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11200405657291412 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08671385049819946 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.40635645389556885 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29625847935676575 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 35.64 secs (0.59 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 145 | |
INFO:birdwatch.matrix_factorization:epoch 145 0.40635353326797485 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29625794291496277 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20274947583675385 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Harassment tag consensus elapsed time: 260.94 secs (4.35 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=0.014979 | time=642.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.4173, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.172573 | time=0.8s | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Helpfulness scores post-harassment elapsed time: 5.71 secs (0.10 mins) | |
INFO:birdwatch.constants:MFGroupScorer_9: Compute tag thresholds for percentiles elapsed time: 7.36 secs (0.12 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11200381815433502 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08671258389949799 | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_8 run_scorer_parallelizable: Loading data elapsed time: 20.34 secs (0.34 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_8 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_8. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 160470 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 222125 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 109801 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11082334816455841 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08564222604036331 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 103061 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 34070222 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 19811605 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 103061, Notes: 607519 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 32.61067555088812 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 192.23183357429096 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.4033586382865906 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.343523234128952 | |
INFO:birdwatch.scorer: Ratings after group filter: 755067 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filter input elapsed time: 44.72 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_8: e38b8c9cc50be625183f1ebfbfbdc7fe3b9dd38afa4f8d3072d6cf5a9f01e928 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 275946, Num Unique Notes Rated: 33053, Num Unique Raters: 3306 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Prepare ratings elapsed time: 0.22 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_8: 5ba7b92ddf9b0604490781d94cf5fb503c3712d4557bb9fe0141378f65bb5b36 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_8: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_8: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3306, Notes: 33053 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.348591655825492 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 83.46823956442832 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.579975605010986 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.086886405944824 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3220418095588684 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.24266822636127472 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1495606005191803 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11007282882928848 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1065634936094284 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07200613617897034 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10056588053703308 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06800234317779541 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09981906414031982 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06731637567281723 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09971969574689865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06720536947250366 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09970708191394806 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06720436364412308 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.09970612078905106 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06720240414142609 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17383065819740295 | |
INFO:birdwatch.scorer:MFGroupScorer_8 First MF/stable init elapsed time: 5.25 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_8 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.19 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10804815590381622 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07902276515960693 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11184205114841461 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08650998771190643 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.44 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Compute scored notes elapsed time: 41.96 secs (0.70 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 754838 post-tombstones and 229 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 603028, including 603028 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 27559 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Compute valid ratings elapsed time: 1.12 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Helpfulness scores pre-harassment elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3306 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22616 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3147 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2844 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 275946 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 257245 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filtering by helpfulness score elapsed time: 0.33 secs (0.01 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 154971 | |
1 10814 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 91460 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 110974, Num Unique Notes Rated: 14345, Num Unique Raters: 2210 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 105610 | |
1 5364 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04833564618739525 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 19.688665175242356 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2210, Notes: 14345 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.73607528755664 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 50.21447963800905 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.1716248989105225 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.267361044883728 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6471165418624878 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29546457529067993 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.39620450139045715 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.21435728669166565 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.36399292945861816 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2028387039899826 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3589763641357422 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20143675804138184 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.35827311873435974 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20115342736244202 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.3582633435726166 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.20116233825683594 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.3194645345211029 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Harassment tag consensus elapsed time: 1.75 secs (0.03 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Helpfulness scores post-harassment elapsed time: 0.17 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3306 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22616 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 2964 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2661 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 275946 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 221134 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2661, Notes: 33041 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 6.692715111528101 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 83.10184141300263 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37918394804000854 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30834564566612244 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10088010132312775 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06582921743392944 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09664016962051392 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06504671275615692 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09514918178319931 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061900824308395386 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09505777060985565 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06201479583978653 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09502474218606949 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06188756600022316 | |
INFO:birdwatch.matrix_factorization:Num epochs: 102 | |
INFO:birdwatch.matrix_factorization:epoch 102 0.0950261726975441 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061940036714076996 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17209193110466003 | |
INFO:birdwatch.constants:Final round MF elapsed time: 3.72 secs (0.06 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_8 prescoring, about to call diligence with 221134 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0026E9A04A48A9CF87EA5FA9499883B8868F322F089686... -0.484264 | |
1 002CE9F3E04AE7DC8B2629A4C755E7120416A9AB7BDF34... -0.811401 | |
2 002E8C0F3F6321C14A72393D1A7CB72049853C81110CAA... -0.359472 | |
3 004DF35A540C1F2CFC12C89E8F0CA622480A4F0A52123C... -0.455774 | |
4 00506BFAD47756108668671B68A5FCCA78046636D92B76... 0.273995 | |
... ... ... | |
2656 FF97899D2A4EEDBDCD42BA1004D5D696AD069094217867... -0.410498 | |
2657 FF98EA5358D2281496E24195141FA88EB6337C53188146... -0.065290 | |
2658 FFA64E61F9B012016BB7ACCFE2FF2E42D57BB570E94452... 0.798207 | |
2659 FFAA122DB59243500CA1C39E0536AAA151881CBD989683... -0.355223 | |
2660 FFB650E9ECB211EBA618F520B9CDD0F1624C22A71BA73D... -0.008486 | |
[2661 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2661, vs. num we are initializing: 2661 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2661 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.732342 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.260558 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.796650 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.690283 | time=2.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.648711 | time=3.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.625763 | time=4.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.610591 | time=4.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.599524 | time=5.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.591774 | time=6.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.586322 | time=6.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.582575 | time=7.5s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.3650, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.582471 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.480372 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.471387 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.470443 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.470443 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.560512 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.092619 | time=142.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.501221 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.500428 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.500402 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.2465, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.5826, 1.4704, 0.5004 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Low Diligence MF elapsed time: 10.82 secs (0.18 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10394581407308578 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07881125062704086 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11184175312519073 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08650964498519897 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11080179363489151 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08564706891775131 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.18 secs (0.57 mins) | |
INFO:birdwatch.constants:MFGroupScorer_8: Compute tag thresholds for percentiles elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10367897152900696 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07856595516204834 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_7 run_scorer_parallelizable: Loading data elapsed time: 21.75 secs (0.36 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_7 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_7. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10345962643623352 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07761729508638382 | |
INFO:birdwatch.matrix_factorization:Num epochs: 110 | |
INFO:birdwatch.matrix_factorization:epoch 110 0.1107998788356781 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08561023324728012 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16231831908226013 | |
INFO:birdwatch.scorer:MFCoreScorer First full MF (initializated with stable-initialization) elapsed time: 788.77 secs (13.15 mins) | |
INFO:birdwatch.scorer:MFCoreScorer First MF/stable init elapsed time: 926.01 secs (15.43 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFCoreScorer | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1747969 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11181958019733429 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0865093469619751 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10343949496746063 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07770797610282898 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filter input elapsed time: 50.70 secs (0.85 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10343949496746063 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07770797610282898 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1557331085205078 | |
INFO:birdwatch.constants:Final round MF elapsed time: 256.45 secs (4.27 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_13 prescoring, about to call diligence with 19811605 final round ratings. | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_7: 4fd9db736954231a479492812e56c9bfbf3f96fac3fda4cb1184447f7dca0c9b | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1186217, Num Unique Notes Rated: 79466, Num Unique Raters: 16354 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Prepare ratings elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_7: 2e4005adbab8cf699eba97425ed6a53ad633fcbdd3e734f38ec03b8c02ff657c | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_7: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_7: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 16354, Notes: 79466 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.927352578461228 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.5337532102238 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.320486545562744 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.846254825592041 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3264220356941223 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.24869173765182495 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15988917648792267 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12357708066701889 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.12232820689678192 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09069563448429108 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11620274931192398 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08597643673419952 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11545437574386597 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0853646770119667 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11535768210887909 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08523453027009964 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.11534492671489716 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08524022996425629 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.090680 | time=282.4s | |
INFO:birdwatch.matrix_factorization:Num epochs: 154 | |
INFO:birdwatch.matrix_factorization:epoch 154 0.1153431162238121 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08523191511631012 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17988497018814087 | |
INFO:birdwatch.scorer:MFGroupScorer_7 First MF/stable init elapsed time: 19.79 secs (0.33 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_7 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.631781 | |
1 00022C96980039352E2D04B5E533090FA8BA333F87C5EB... -0.191494 | |
2 0002CA11E7127598E26C281F887129ADA2623C82BBCE8F... -0.410286 | |
3 00043CBC4A8DCE4003E776DCD459F07595B529D190FE6A... -0.579432 | |
4 0006A0E14304DF01B1004C185280BD0429F985BC9BA3BE... -0.037304 | |
... ... ... | |
103056 FFFE47B0979CC079B88D01EEBB42203E78DD1CC8115671... 0.032142 | |
103057 FFFE4A4B357B94699BF04D58296EE33122C50C0519E3D6... 0.554487 | |
103058 FFFE83C62E7D3E361E85273D9A8BC1D7D206AF97FAA90E... -0.073755 | |
103059 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... -0.649808 | |
103060 FFFF7E0B3ADB6FC5FB42B0F01FFD24495410C1AE4AC986... 0.059724 | |
[103061 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 103061, vs. num we are initializing: 103061 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 103061 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.256786 | time=0.3s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11181925237178802 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08650881797075272 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.853110 | time=36.5s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.56 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Compute scored notes elapsed time: 43.10 secs (0.72 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 112 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 112 0.11181748658418655 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.086497962474823 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16396962106227875 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer First full MF (initializated with stable-initialization) elapsed time: 858.38 secs (14.31 mins) | |
INFO:birdwatch.note_ratings:Total ratings: 1747611 post-tombstones and 358 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1321793, including 1321790 post-tombstones and 3 pre-tombstones. | |
INFO:birdwatch.scorer:MFExpansionPlusScorer First MF/stable init elapsed time: 989.60 secs (16.49 mins) | |
INFO:birdwatch.note_ratings:Total valid ratings: 139212 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Compute valid ratings elapsed time: 2.21 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Helpfulness scores pre-harassment elapsed time: 0.30 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 16354 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 56374 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 15545 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 13251 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1186217 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 1011352 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filtering by helpfulness score elapsed time: 1.45 secs (0.02 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 654964 | |
1 52803 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 303585 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 602485, Num Unique Notes Rated: 44172, Num Unique Raters: 11720 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 566219 | |
1 36266 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06019402972688117 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 15.61294325263332 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 11720, Notes: 44172 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 13.639522774608348 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 51.406569965870304 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4724719524383545 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.548095703125 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6847002506256104 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.37777310609817505 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4944334626197815 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31795963644981384 | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFExpansionPlusScorer | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4676464796066284 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3076119124889374 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.463948130607605 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30611005425453186 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.46341246366500854 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3058334290981293 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.46333253383636475 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30579888820648193 | |
INFO:birdwatch.matrix_factorization:Num epochs: 125 | |
INFO:birdwatch.matrix_factorization:epoch 125 0.4633316993713379 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3057929277420044 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2689174711704254 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Harassment tag consensus elapsed time: 9.01 secs (0.15 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Helpfulness scores post-harassment elapsed time: 0.41 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 16354 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 56374 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 14661 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 12367 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1186217 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 871392 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 12367, Notes: 79428 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.970841516845445 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 70.46106573946794 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37179672718048096 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3022027611732483 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11187608540058136 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07947373390197754 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10989455878734589 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08022681623697281 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10831080377101898 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07662466168403625 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10822497308254242 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07705562561750412 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10818985849618912 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07679852843284607 | |
INFO:birdwatch.matrix_factorization:Num epochs: 102 | |
INFO:birdwatch.matrix_factorization:epoch 102 0.1081920713186264 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07686451077461243 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1820950210094452 | |
INFO:birdwatch.constants:Final round MF elapsed time: 11.30 secs (0.19 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_7 prescoring, about to call diligence with 871392 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... 0.034742 | |
1 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... 0.251330 | |
2 0007FA945EF35219D0388D94715189F6231A77263D83B1... 0.295828 | |
3 0009FC5E666A87A24C6E0A4F985A0F8128DE237BBB6D7B... 0.371677 | |
4 000F1687C56AB92D846F2B9BFA71AE16D8A88426754E3B... 0.678841 | |
... ... ... | |
12362 FFE9CF3FC6CEBF09A2748F1A977245A86BE16A74850C3F... -0.056599 | |
12363 FFEAF4A561DFA90006C71904FB176E3BA20BF932ED1AE6... -0.148612 | |
12364 FFED9EACB703DDAE2E9BBF2B5A7FC35065AB055878F50D... 0.393813 | |
12365 FFEF7AD019F0E1EE28157E1298D5469164E8D7AF2CA91D... -0.270717 | |
12366 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... 0.389263 | |
[12367 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12367, vs. num we are initializing: 12367 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 12367 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.543449 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.515074 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.439981 | time=71.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.099560 | time=4.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.019695 | time=6.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.991262 | time=8.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.976698 | time=10.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.967536 | time=12.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.961210 | time=14.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.956856 | time=16.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.953736 | time=18.0s | |
INFO:birdwatch.matrix_factorization:Num epochs: 112 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.951627 | time=20.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2657, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.951568 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 112 0.11181716620922089 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08649764209985733 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16396866738796234 | |
INFO:birdwatch.scorer:MFExpansionScorer First full MF (initializated with stable-initialization) elapsed time: 905.44 secs (15.09 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.870546 | time=2.0s | |
INFO:birdwatch.scorer:MFExpansionScorer First MF/stable init elapsed time: 1040.32 secs (17.34 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.859959 | time=3.9s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.858937 | time=5.9s | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFExpansionScorer | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=115 | loss=1.858854 | time=7.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.547362 | time=0.0s | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.467697 | time=1.2s | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.466692 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.466650 | time=3.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.2692, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.9516, 1.8589, 0.4667 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Low Diligence MF elapsed time: 31.51 secs (0.53 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.381900 | time=105.1s | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.090583 | time=406.9s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.79 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFCoreScorer Compute scored notes elapsed time: 179.81 secs (3.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.367405 | time=138.9s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.10 secs (0.57 mins) | |
INFO:birdwatch.constants:MFGroupScorer_7: Compute tag thresholds for percentiles elapsed time: 2.00 secs (0.03 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_6 run_scorer_parallelizable: Loading data elapsed time: 19.46 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_6 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_6. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.362330 | time=173.6s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.360109 | time=207.0s | |
INFO:birdwatch.scorer: Ratings after group filter: 5450867 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filter input elapsed time: 44.96 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_6: 04b43cd2c96d07ac1e2ab768cf4f0fa515c9e79971a7fe8ae13f1ae30910d433 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4659293, Num Unique Notes Rated: 209890, Num Unique Raters: 31333 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Prepare ratings elapsed time: 2.32 secs (0.04 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_6: e11e3213ef7a5a637ac0d9affe20b03233bc8d429851ed0c55ce5975c5d19f41 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_6: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_6: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 31333, Notes: 209890 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 22.198737433893946 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 148.70242236619538 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.272146224975586 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.815246105194092 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.32835713028907776 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.25326278805732727 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.358961 | time=240.8s | |
INFO:birdwatch.note_ratings:Total ratings: 101910486 post-tombstones and 232250 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 80436103, including 80370184 post-tombstones and 65919 pre-tombstones. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.090571 | time=528.8s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13337045907974243 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09825057536363602 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.23 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Compute scored notes elapsed time: 206.16 secs (3.44 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10895848274230957 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0793888196349144 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10542146116495132 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07658693194389343 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.358307 | time=274.5s | |
INFO:birdwatch.note_ratings:Total valid ratings: 5578551 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10493800044059753 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07620479166507721 | |
INFO:birdwatch.scorer:MFCoreScorer Compute valid ratings elapsed time: 146.32 secs (2.44 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFCoreScorer Helpfulness scores pre-harassment elapsed time: 5.60 secs (0.09 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.13 secs (0.55 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10487690567970276 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07612945884466171 | |
INFO:birdwatch.scorer:MFExpansionScorer Compute scored notes elapsed time: 194.29 secs (3.24 mins) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1048690527677536 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07612219452857971 | |
INFO:birdwatch.matrix_factorization:Num epochs: 142 | |
INFO:birdwatch.matrix_factorization:epoch 142 0.10486896336078644 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07612641155719757 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16801659762859344 | |
INFO:birdwatch.scorer:MFGroupScorer_6 First MF/stable init elapsed time: 75.20 secs (1.25 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_6 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.357908 | time=307.9s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.357659 | time=341.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0654, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.357652 | time=0.2s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.60 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Compute scored notes elapsed time: 45.76 secs (0.76 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 5449674 post-tombstones and 1193 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4368026, including 4367957 post-tombstones and 69 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 342817 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Compute valid ratings elapsed time: 6.02 secs (0.10 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Helpfulness scores pre-harassment elapsed time: 0.56 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 31333 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 94982 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 26071 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 24617 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4659293 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3817415 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filtering by helpfulness score elapsed time: 5.94 secs (0.10 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2477884 | |
1 139275 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1200256 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2393398, Num Unique Notes Rated: 126280, Num Unique Raters: 23313 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2281056 | |
1 112342 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04693828606859369 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 20.30456997382991 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 23313, Notes: 126280 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 18.95310421286031 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 102.6636640501008 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.355055570602417 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4591138362884521 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=0.090568 | time=649.8s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.675491213798523 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3443944454193115 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4422590732574463 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2770985960960388 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.110433 | time=31.2s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4074243903160095 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2647395431995392 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.40271952748298645 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26267001032829285 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4020889401435852 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2623730003833771 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 590255 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 574793 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 463424 | |
INFO:birdwatch.matrix_factorization:Num epochs: 119 | |
INFO:birdwatch.matrix_factorization:epoch 119 0.40201109647750854 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26234227418899536 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2431156486272812 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Harassment tag consensus elapsed time: 35.06 secs (0.58 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Helpfulness scores post-harassment elapsed time: 1.00 secs (0.02 mins) | |
INFO:birdwatch.note_ratings:Total ratings: 118075724 post-tombstones and 241616 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 93087196, including 93019731 post-tombstones and 67465 pre-tombstones. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 31333 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 94982 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 23379 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 21925 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 4659293 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 2986911 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 21925, Notes: 208928 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.296365255016083 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 136.23311288483467 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38625186681747437 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3190094530582428 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.100222 | time=62.9s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10281260311603546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0706467404961586 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09916238486766815 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0704370066523552 | |
INFO:birdwatch.note_ratings:Total ratings: 118073534 post-tombstones and 241615 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 93085538, including 93018073 post-tombstones and 67465 pre-tombstones. | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09785157442092896 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06802797317504883 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09775014966726303 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06777236610651016 | |
INFO:birdwatch.note_ratings:Total valid ratings: 7102556 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09773420542478561 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06759101897478104 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Compute valid ratings elapsed time: 179.64 secs (2.99 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.0977339893579483 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06772854179143906 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1703343689441681 | |
INFO:birdwatch.constants:Final round MF elapsed time: 41.85 secs (0.70 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_6 prescoring, about to call diligence with 2986911 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.099368 | time=94.9s | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0002188E5ED3028646C97CBE9ADCD12CB5B8BFAF8819BD... -0.174797 | |
1 0002EEF8B312A7DCBF698391778CD9D0F7ADA652FBFB9E... -0.298513 | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... -0.499946 | |
3 000677AE7F63255B464AD153D315B2E25DB8BF771A379D... 0.469928 | |
4 000760B0C9739248AF3CA6B833A219CC24A4B85C5B4D0D... 0.212109 | |
... ... ... | |
21920 FFFAA9B8DDDDF9C3CD12F97B13C1658E63F495884418D6... 0.010096 | |
21921 FFFBB8B4BE340D5AAC99E9168F2711EBAB3CE5C9A2567B... -0.090337 | |
21922 FFFC8248F057883916F06F78A0DB7878BFB2C6162434E2... -0.543905 | |
21923 FFFD65E501817C7A5590FADEE2646D40BF1BA5582F6801... -0.327678 | |
21924 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... -0.009151 | |
[21925 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 21925, vs. num we are initializing: 21925 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 21925 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.637968 | time=0.0s | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 432469 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 100691291 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 77805643 | |
INFO:birdwatch.scorer:MFCoreScorer Filtering by helpfulness score elapsed time: 156.24 secs (2.60 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Helpfulness scores pre-harassment elapsed time: 7.83 secs (0.13 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=170 | loss=0.090568 | time=729.9s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.406891 | time=6.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.099340 | time=105.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.007650 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.333924 | time=0.2s | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 48758374 | |
1 3306464 | |
dtype: int64 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.023547 | time=13.6s | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 25671715 | |
INFO:birdwatch.note_ratings:Total valid ratings: 7101780 | |
INFO:birdwatch.scorer:MFExpansionScorer Compute valid ratings elapsed time: 170.81 secs (2.85 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.962339 | time=20.4s | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFExpansionScorer Helpfulness scores pre-harassment elapsed time: 6.81 secs (0.11 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.943640 | time=27.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.255059 | time=21.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.934534 | time=33.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.928887 | time=40.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.925113 | time=47.4s | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 51416062, Num Unique Notes Rated: 1002225, Num Unique Raters: 416196 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.254126 | time=42.3s | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 48168519 | |
1 3247543 | |
dtype: int64 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.922410 | time=54.2s | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06316203290714874 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.832295984995428 with BCEWithLogitsLoss | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.920536 | time=61.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.254087 | time=52.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.4071, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3577, 2.0993, 0.2541 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Low Diligence MF elapsed time: 525.92 secs (8.77 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.919230 | time=67.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.2776, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.919194 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.754996 | time=6.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.743777 | time=13.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.007090 | time=79.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.742854 | time=19.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.742818 | time=21.9s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.448196 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.380205 | time=3.9s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.379360 | time=7.7s | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.379324 | time=9.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.3168, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.9192, 1.7428, 0.3793 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Low Diligence MF elapsed time: 102.95 secs (1.72 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 416196, Notes: 1002225 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 51.30191523859413 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 123.53809743486242 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.matrix_factorization:epoch 0 3.2293245792388916 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4528207778930664 | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 747994 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712511 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 607970 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.97 secs (0.57 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.82 secs (0.56 mins) | |
INFO:birdwatch.constants:MFGroupScorer_6: Compute tag thresholds for percentiles elapsed time: 6.59 secs (0.11 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 747974 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712504 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 607953 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.007018 | time=157.4s | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_5 run_scorer_parallelizable: Loading data elapsed time: 19.34 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_5 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7047713994979858 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.42888182401657104 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_5. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 555376 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 116569833 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 89986635 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filtering by helpfulness score elapsed time: 182.63 secs (3.04 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.constants:MFGroupScorer_13: Compute tag thresholds for percentiles elapsed time: 60.91 secs (1.02 mins) | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 56923884 | |
1 3891470 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 29101252 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 555359 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 116567689 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 89985012 | |
INFO:birdwatch.scorer:MFExpansionScorer Filtering by helpfulness score elapsed time: 184.20 secs (3.07 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=080 | loss=0.007014 | time=208.0s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6817, requires_grad=True) | |
INFO:birdwatch.helpfulness_model:Helpfulness reputation loss: 0.0150, 0.0906, 0.0070 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/reputation_scorer.py, in _prescore_notes_and_users, at line 135: noteStats = noteStats.merge(noteStatusHistory[[c.noteIdKey]].drop_duplicates(), how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.reputation_scorer:Reputation prescoring: returning these columns: | |
noteStats: Index(['noteId', 'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteInterceptRound2'], | |
dtype='object') | |
raterStats: Index(['raterParticipantId', 'internalRaterReputation', | |
'internalRaterIntercept', 'internalRaterFactor1', | |
'lowDiligenceRaterInterceptRound2'], | |
dtype='object') | |
INFO:birdwatch.scorer: Ratings after group filter: 555225 | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filter input elapsed time: 43.67 secs (0.73 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_5: 1c9ed1ed6010515673b56328cd8b3bc9b0a40557890e65105bf097ca26e95211 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 237917, Num Unique Notes Rated: 22188, Num Unique Raters: 3852 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Prepare ratings elapsed time: 0.16 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_5: c51cdee3a36b17132270a37efb2fdfee8481384380b6c2278880f1866213e291 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_5: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_5: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 3852, Notes: 22188 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.72277807824049 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 61.76453790238837 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.505688190460205 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.011424541473389 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3373934328556061 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2626161575317383 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13293388485908508 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09004734456539154 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10200294852256775 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06733014434576035 | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 56922595 | |
1 3891288 | |
dtype: int64 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.0965874046087265 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0627824142575264 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09589532017707825 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0619732066988945 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09580479562282562 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0619276687502861 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.09579093754291534 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06191651150584221 | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.09579023718833923 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06191713735461235 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18588478863239288 | |
INFO:birdwatch.scorer:MFGroupScorer_5 First MF/stable init elapsed time: 4.38 secs (0.07 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_5 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 29101100 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_4 run_scorer_parallelizable: Loading data elapsed time: 19.71 secs (0.33 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_4 set to: 4 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 60117880, Num Unique Notes Rated: 1096849, Num Unique Raters: 532307 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_4. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4833068251609802 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3419259488582611 | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 56285750 | |
1 3832130 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06374359841032319 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.687849838079604 with BCEWithLogitsLoss | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.65 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Compute scored notes elapsed time: 40.79 secs (0.68 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_3 run_scorer_parallelizable: Loading data elapsed time: 19.64 secs (0.33 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_3 set to: 4 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 555087 post-tombstones and 138 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 441165, including 441165 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 26567 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Compute valid ratings elapsed time: 0.99 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Helpfulness scores pre-harassment elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3852 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 16995 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3650 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 3182 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 237917 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 211573 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filtering by helpfulness score elapsed time: 0.27 secs (0.00 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 140054 | |
1 11645 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 59874 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 118264, Num Unique Notes Rated: 11422, Num Unique Raters: 2729 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 112002 | |
1 6262 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.0529493336941081 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 17.88597892047269 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2729, Notes: 11422 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.354053580808966 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 43.33602052033712 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.3255653381347656 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4221495389938354 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6385119557380676 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2950723469257355 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 60116394, Num Unique Notes Rated: 1096847, Num Unique Raters: 532290 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4211670756340027 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.23981989920139313 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.3886503577232361 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22783590853214264 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.38440507650375366 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22651779651641846 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3837811350822449 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22617749869823456 | |
INFO:birdwatch.matrix_factorization:Num epochs: 113 | |
INFO:birdwatch.matrix_factorization:epoch 113 0.3837122321128845 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.22612765431404114 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2777239680290222 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Harassment tag consensus elapsed time: 1.87 secs (0.03 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Helpfulness scores post-harassment elapsed time: 0.14 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 3852 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 16995 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3458 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2990 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 237917 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 183951 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2990, Notes: 22175 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.295422773393462 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 61.52207357859532 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38200655579566956 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30709701776504517 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09668667614459991 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05983196198940277 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09239980578422546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05907268077135086 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_3. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09099796414375305 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.056515373289585114 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09087927639484406 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.056339509785175323 | |
INFO:birdwatch.matrix_factorization:Num epochs: 81 | |
INFO:birdwatch.matrix_factorization:epoch 81 0.09087927639484406 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.056339509785175323 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18524184823036194 | |
INFO:birdwatch.constants:Final round MF elapsed time: 2.11 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_5 prescoring, about to call diligence with 183951 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 001670E335A2559879EA4C5497E9469BD163D949F32CFB... 0.694623 | |
1 003DAEE4C05D42B92583AD7BB4E5FC40051E7EDB8A34F4... -0.533865 | |
2 003EF532CDFCF35BC31EA059E3C981E2866B6FE12DFFE3... -0.446522 | |
3 0078B6E44FB3B19530E03D5FF363823AE29AEF431E16A4... -0.326030 | |
4 007931FC488902DD0A8CB7AA24BFAB189E614C73CCAB9E... -0.428151 | |
... ... ... | |
2985 FF9AD1E27202E08F5B4E371E2F9CDEFD12B04407DD00E4... -0.146677 | |
2986 FFAC3C1B41112324A7D9677419DF2C179D47327EFC3458... -0.230635 | |
2987 FFB5DC98D9D19D482617D7D9F61B91DFB74F2B5588EADC... 0.316966 | |
2988 FFBF66FB8FE4AEF510F7CD3F18B24F5FCCD83CFBFB4F0E... -0.668074 | |
2989 FFC5FEB6111C3D7EEE8617D8CDE530946BE44871355D9D... 0.003866 | |
[2990 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2990, vs. num we are initializing: 2990 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2990 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.861629 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.271580 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.814015 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.722945 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.689590 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.670991 | time=2.6s | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 56284448 | |
1 3831946 | |
dtype: int64 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.659049 | time=3.1s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.650878 | time=3.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.645420 | time=4.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.641486 | time=4.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.638443 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(-0.1347, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.638358 | time=0.0s | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.560054 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.550052 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.549247 | time=1.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.549219 | time=1.8s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.552185 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.488989 | time=0.3s | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.06374211334099647 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 14.688215334976015 with BCEWithLogitsLoss | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.488173 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.488139 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(0.6464, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6384, 1.5492, 0.4881 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Low Diligence MF elapsed time: 7.85 secs (0.13 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 1911572 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filter input elapsed time: 45.04 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_4: 6dae5f774d3e567d9aa37542a636926baa18a742be1611d99f276911efd08db8 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 1507143, Num Unique Notes Rated: 60014, Num Unique Raters: 14399 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Prepare ratings elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_4: 0b363ba441451b25ef409f6453b14bff88477c94a8a8efd52070b32487252ef2 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_4: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_4: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 14399, Notes: 60014 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 25.113190255607027 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 104.66997708174179 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.698787689208984 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.223075866699219 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.31589454412460327 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.25655877590179443 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13759097456932068 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10495001077651978 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10322079062461853 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07258222997188568 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09771555662155151 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06851274520158768 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09711906313896179 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06810428202152252 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.09703299403190613 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06803431361913681 | |
INFO:birdwatch.scorer: Ratings after group filter: 6154771 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 532307, Notes: 1096849 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:epoch 140 0.0970223918557167 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06801554560661316 | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 54.80962283778351 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 112.93836075798365 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filter input elapsed time: 45.92 secs (0.77 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:Num epochs: 145 | |
INFO:birdwatch.matrix_factorization:epoch 145 0.09702189266681671 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06801630556583405 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16772010922431946 | |
INFO:birdwatch.scorer:MFGroupScorer_4 First MF/stable init elapsed time: 24.91 secs (0.42 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_4 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.44839948415756226 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3275475800037384 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 0 3.2308831214904785 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.471374273300171 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_3: 219a2d389f6536ef18f19193931800b0694640cb02c48c93fd879e7a8fda6ca0 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.82 secs (0.56 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5436988, Num Unique Notes Rated: 168222, Num Unique Raters: 50733 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Prepare ratings elapsed time: 2.67 secs (0.04 mins) | |
INFO:birdwatch.constants:MFGroupScorer_5: Compute tag thresholds for percentiles elapsed time: 0.52 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_3: cddd5a08fb6bb9838d0866f5839d663c6701abecf9e154d89a98b491dc08895a | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_3: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_3: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 50733, Notes: 168222 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 32.32031482208035 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 107.16866733684189 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.160441875457764 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.707648277282715 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 532290, Notes: 1096847 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 54.80836798568989 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 112.93917601307558 | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_2 run_scorer_parallelizable: Loading data elapsed time: 19.34 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_2 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.35091888904571533 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28343868255615234 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.230905532836914 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.4713935852050781 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_2. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.84 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Compute scored notes elapsed time: 43.27 secs (0.72 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.14208878576755524 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10886015743017197 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1911260 post-tombstones and 312 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1578669, including 1578669 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 108931 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Compute valid ratings elapsed time: 1.73 secs (0.03 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Helpfulness scores pre-harassment elapsed time: 0.28 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 14399 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 38413 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 12009 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 11218 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1507143 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 1252671 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filtering by helpfulness score elapsed time: 1.91 secs (0.03 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 764650 | |
1 79689 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 408332 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 779449, Num Unique Notes Rated: 37421, Num Unique Raters: 10540 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 710026 | |
1 69423 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.08906676382932045 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 10.227532662086054 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 10540, Notes: 37421 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 20.82918682023463 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 73.95151802656547 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.406766653060913 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5144782066345215 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.670211911201477 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.356147825717926 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4503651261329651 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2893131375312805 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4171738922595978 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.277433842420578 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4127207398414612 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2751638889312744 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11586557328701019 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08810189366340637 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4121388792991638 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27491995692253113 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.41206347942352295 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2749044597148895 | |
INFO:birdwatch.matrix_factorization:Num epochs: 132 | |
INFO:birdwatch.matrix_factorization:epoch 132 0.4120563864707947 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2748928964138031 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20987597107887268 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Harassment tag consensus elapsed time: 13.01 secs (0.22 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_4 Helpfulness scores post-harassment elapsed time: 0.38 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 14399 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 38413 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 10669 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 9878 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 1507143 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 966156 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 9878, Notes: 59954 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 16.114954798678987 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 97.80886819194168 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.38886088132858276 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32036033272743225 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09786438941955566 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0646565780043602 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.0945834070444107 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06539532542228699 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11031527817249298 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08270066231489182 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09400955587625504 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06439482420682907 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4420861601829529 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3238765597343445 | |
INFO:birdwatch.scorer: Ratings after group filter: 1485387 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09385792165994644 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06359776109457016 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Filter input elapsed time: 45.10 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.09385128319263458 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06359480321407318 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.09385128319263458 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06359480321407318 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16988325119018555 | |
INFO:birdwatch.constants:Final round MF elapsed time: 13.41 secs (0.22 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_4 prescoring, about to call diligence with 966156 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0011AB5425173F62E5D4A1787E34ED324BDD5807D4C3B8... -0.559426 | |
1 001C8D32D1F35CAC07983265BA3F769C6976F5A71141E4... 0.409342 | |
2 0026D52237BA91FDF564C99A30B594C53E0E5E7CF76F5C... 0.577273 | |
3 003B5BBD63338E6ECB7DA6F16AC010576B506676849D76... 0.316241 | |
4 003CE80F068D189A05BBA9748FCA578819680378FBDEB7... -0.450130 | |
... ... ... | |
9873 FFCB30F2118337303F4EBFD59C8A33E85A2C7276BD67C1... -0.457454 | |
9874 FFD3B8B9E935D1D393558464F9172AF81C6CF5E76C31EA... 0.380038 | |
9875 FFDCC6136CBDCE1394D680A912CB4203DE5D035006979B... 0.509950 | |
9876 FFEC392A6B742286C786DE71BB4102B6804FF360A00B3A... 0.175354 | |
9877 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... -0.278656 | |
[9878 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 9878, vs. num we are initializing: 9878 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 9878 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=19.016602 | time=0.0s | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_2: 69a9fee793c8a7edda8b8d4d229ca3a91f2881b198ec553cf1932d65b626cec8 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 890938, Num Unique Notes Rated: 68353, Num Unique Raters: 9775 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Prepare ratings elapsed time: 0.48 secs (0.01 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_2: f94e1c2eb904b65e6015e37c6de0eeb390703f154f7f6aea964abc8cbb1867cc | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_2: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.621771 | time=2.3s | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_2: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 9775, Notes: 68353 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 13.034365719134493 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 91.14455242966751 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.517404079437256 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.035797119140625 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.150838 | time=4.7s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.37099361419677734 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27346354722976685 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10985220968723297 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08248025923967361 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15487666428089142 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11730708181858063 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.080948 | time=7.0s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11793714761734009 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08650939911603928 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.061856 | time=9.3s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7068929672241211 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.4391211271286011 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11242350935935974 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0823739767074585 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.053419 | time=11.7s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11172764748334885 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0817120373249054 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.048802 | time=14.1s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.111640065908432 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0816383808851242 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1116287037730217 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08163300156593323 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.045836 | time=16.5s | |
INFO:birdwatch.matrix_factorization:Num epochs: 147 | |
INFO:birdwatch.matrix_factorization:epoch 147 0.11162795126438141 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08163106441497803 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1721932739019394 | |
INFO:birdwatch.scorer:MFGroupScorer_2 First MF/stable init elapsed time: 14.32 secs (0.24 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_2 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10978099703788757 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08239739388227463 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.043804 | time=18.8s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.042342 | time=21.4s | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.041303 | time=23.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7280, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.041274 | time=0.0s | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.912494 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.901581 | time=4.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.900838 | time=6.7s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.109770767390728 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08237619698047638 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.900815 | time=7.4s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.454399 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.380901 | time=1.3s | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.10977018624544144 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08237743377685547 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16864676773548126 | |
INFO:birdwatch.scorer:MFGroupScorer_3 First MF/stable init elapsed time: 94.30 secs (1.57 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_3 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.380001 | time=2.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.379962 | time=3.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8078, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0413, 1.9008, 0.3800 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Low Diligence MF elapsed time: 35.72 secs (0.60 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.7068942785263062 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.43912115693092346 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.56 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.58 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Compute scored notes elapsed time: 41.63 secs (0.69 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 1484989 post-tombstones and 398 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 1174087, including 1174087 post-tombstones and 0 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 89077 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Compute valid ratings elapsed time: 1.37 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Helpfulness scores pre-harassment elapsed time: 0.23 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 9775 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 42202 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 9216 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 8003 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 890938 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 776161 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Filtering by helpfulness score elapsed time: 1.06 secs (0.02 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 490341 | |
1 24919 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 260901 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 417285, Num Unique Notes Rated: 33587, Num Unique Raters: 7081 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 402661 | |
1 14624 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.035045592340966006 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 27.53425875273523 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 7081, Notes: 33587 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 12.424003334623515 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 58.93023584239514 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.1215243339538574 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.212156057357788 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6161671280860901 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2901526689529419 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.381754070520401 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2078319489955902 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.3446827828884125 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1943131685256958 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.3381313383579254 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.19340667128562927 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.3370509147644043 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.19298066198825836 | |
INFO:birdwatch.matrix_factorization:Num epochs: 115 | |
INFO:birdwatch.matrix_factorization:epoch 115 0.33685457706451416 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.19294053316116333 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.2822979986667633 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Harassment tag consensus elapsed time: 5.99 secs (0.10 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_2 Helpfulness scores post-harassment elapsed time: 0.27 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.44117575883865356 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32340681552886963 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 9775 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 42202 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 8753 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 7540 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 890938 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 658930 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 7540, Notes: 68286 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 9.649562135723281 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 87.39124668435014 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3927237391471863 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3238324820995331 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10900013148784637 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07585250586271286 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10410565137863159 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07415442913770676 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1026550680398941 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07165952771902084 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10253122448921204 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07152654230594635 | |
INFO:birdwatch.matrix_factorization:Num epochs: 95 | |
INFO:birdwatch.matrix_factorization:epoch 95 0.10250472277402878 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07144927978515625 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1710907518863678 | |
INFO:birdwatch.constants:Final round MF elapsed time: 8.21 secs (0.14 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_2 prescoring, about to call diligence with 658930 final round ratings. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.98 secs (0.57 mins) | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00007F6B0991C1CA1DF283A7615A79999117CAC8C962A5... 0.136753 | |
1 0000FDC49B38F4C994CAA60961F88FB421B03D0D43F499... -0.203846 | |
2 001BE45AE64F526CFC3CC1B706DE3D812A6063976CA65D... -0.439461 | |
3 0020A81474D2B3E0479ED2BB0A5577F54852D9381A5DD3... -0.151953 | |
4 00264FCCA9C6517FBBA613AA7F64C431078456BB521359... -0.625927 | |
... ... ... | |
7535 FFE33E8172BAD7A1575F60FCAB8012D6BE7798D2C8A26D... -0.325974 | |
7536 FFEC26DAD31FB175031B1A676DACDDFE983F60DAFA8985... -0.619723 | |
7537 FFF8F9C2C8D0118227B1D6295B8CF7BA535B2A44B2EDEF... -0.543683 | |
7538 FFFF33553CB8A72FF1CB6FB663CED93F292F0D2C161852... -0.398262 | |
7539 FFFF82FC0D34E74125C0E5C894E335531C58342FB7C039... 0.545447 | |
[7540 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 7540, vs. num we are initializing: 7540 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 7540 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.619812 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.479037 | time=1.6s | |
INFO:birdwatch.constants:MFGroupScorer_4: Compute tag thresholds for percentiles elapsed time: 2.16 secs (0.04 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.034774 | time=3.2s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.28 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Compute scored notes elapsed time: 46.41 secs (0.77 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.946562 | time=4.7s | |
INFO:birdwatch.run_scoring:MFGroupScorer_1 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.915432 | time=6.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.899280 | time=8.3s | |
INFO:birdwatch.note_ratings:Total ratings: 6153608 post-tombstones and 1163 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4564479, including 4564467 post-tombstones and 12 pre-tombstones. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.888865 | time=10.4s | |
INFO:birdwatch.note_ratings:Total valid ratings: 484421 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Compute valid ratings elapsed time: 7.58 secs (0.13 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Helpfulness scores pre-harassment elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.881603 | time=12.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.876575 | time=14.0s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.48588138818740845 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3469192087650299 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.872807 | time=15.7s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 50733 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 101066 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 43754 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.870011 | time=17.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.1921, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.869933 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.731862 | time=1.6s | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 40037 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5436988 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4441479 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filtering by helpfulness score elapsed time: 7.30 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2973534 | |
1 146458 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1321487 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.720853 | time=3.2s | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2946951, Num Unique Notes Rated: 106653, Num Unique Raters: 37981 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2819844 | |
1 127107 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.04313169781241697 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 22.184804928131417 with BCEWithLogitsLoss | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.719774 | time=4.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.719740 | time=5.3s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.507944 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.437715 | time=0.9s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 37981, Notes: 106653 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 27.63120587325251 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 77.59013717385008 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.460616111755371 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5580065250396729 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.436819 | time=1.8s | |
INFO:birdwatch.run_scoring:MFGroupScorer_1 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_1 run_scorer_parallelizable: Loading data elapsed time: 19.94 secs (0.33 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_1 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.436783 | time=2.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.1650, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.8700, 1.7197, 0.4368 | |
INFO:birdwatch.scorer:MFGroupScorer_2 Low Diligence MF elapsed time: 25.66 secs (0.43 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_1. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6890877485275269 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3774817883968353 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4575643539428711 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29412174224853516 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.42446714639663696 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28245243430137634 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.48587948083877563 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.34691593050956726 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.42003875970840454 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28034573793411255 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4193997383117676 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2799588739871979 | |
INFO:birdwatch.matrix_factorization:Num epochs: 106 | |
INFO:birdwatch.matrix_factorization:epoch 106 0.4193531274795532 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.27994611859321594 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.25034454464912415 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Harassment tag consensus elapsed time: 40.84 secs (0.68 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_3 Helpfulness scores post-harassment elapsed time: 1.06 secs (0.02 mins) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.4410301744937897 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32334429025650024 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 50733 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 101066 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 38195 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.95 secs (0.57 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 34478 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5436988 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3234204 | |
INFO:birdwatch.constants:MFGroupScorer_2: Compute tag thresholds for percentiles elapsed time: 1.37 secs (0.02 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 5867441 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 34478, Notes: 167503 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.308334776093563 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 93.80486107082777 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Filter input elapsed time: 45.77 secs (0.76 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.run_scoring:MFGroupScorer_14 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 0 0.37275373935699463 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30735349655151367 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_1: ab2239f3406a2b93387dec9cea079ee7e658b7e9e8589566e27de20cd4346798 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10647225379943848 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0762961283326149 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5200616, Num Unique Notes Rated: 142070, Num Unique Raters: 54292 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Prepare ratings elapsed time: 2.79 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_1: baaa79a6aa23f09d4f361fd35751ca6c804bb0b4eb071f614db463656ea4fdc3 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_1: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_1: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10482681542634964 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07767274230718613 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 54292, Notes: 142070 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 36.60601112127824 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 95.78972961025565 | |
INFO:birdwatch.run_scoring:MFGroupScorer_14 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_14 run_scorer_parallelizable: Loading data elapsed time: 19.31 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFGroupScorer_14 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.311474323272705 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.859148025512695 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10351786762475967 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07491569966077805 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_14. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4542800784111023 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33377817273139954 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10346588492393494 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07493498921394348 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3422960638999939 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28132298588752747 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10345213115215302 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07470148801803589 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.1034514307975769 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07485070824623108 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17145097255706787 | |
INFO:birdwatch.constants:Final round MF elapsed time: 48.81 secs (0.81 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_3 prescoring, about to call diligence with 3234204 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.100065 | |
1 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.273228 | |
2 0006F1E9A72BC327122346B1EC672566F8DE4304BC7813... -0.178951 | |
3 0008CE6A2932D0D88C4965BDA83BD8CE906EC91A951066... -0.600315 | |
4 00099B57E40688AFECCE8A3415A2AC45FD8944C33ACB9C... -0.493003 | |
... ... ... | |
34473 FFFBB4B078CA1D3C3E23B986FA1A0BD4B3081E70C2B274... -0.750079 | |
34474 FFFC156EAADE44C6CB99B0EB02DB63AAA7DC330AFC0E4B... -0.632675 | |
34475 FFFC37B8B75A047FC218F52FF5F03C876A906BD09B0F34... 0.290217 | |
34476 FFFD98FC04D3E1615C8BF2617DA7EA6BAEDCED7C9BFDC0... -0.251977 | |
34477 FFFECB9745EFB9D109358D450779F68A96A14C9AC03AD4... -0.534232 | |
[34478 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 34478, vs. num we are initializing: 34478 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13671498000621796 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10655282437801361 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 34478 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=14.793373 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.493970 | time=7.5s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11313247680664062 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08503865450620651 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.45427459478378296 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3337731957435608 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.158050 | time=14.9s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.4410019516944885 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.323337197303772 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.110130 | time=22.4s | |
INFO:birdwatch.scorer: Ratings after group filter: 11195017 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10888727009296417 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08239106088876724 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Filter input elapsed time: 49.12 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.matrix_factorization:Num epochs: 143 | |
INFO:birdwatch.matrix_factorization:epoch 143 0.4410010874271393 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3233371376991272 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20351645350456238 | |
INFO:birdwatch.scorer:MFCoreScorer Harassment tag consensus elapsed time: 607.74 secs (10.13 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.097097 | time=29.8s | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10845153033733368 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0821102038025856 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.091225 | time=37.3s | |
INFO:birdwatch.mf_base_scorer:ratings summary MFGroupScorer_14: 82f8fda71f7fbf66bf5eb19fd50ac62affbbca86cc9803421711e0f246d23187 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.087781 | time=44.7s | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 10305089, Num Unique Notes Rated: 393602, Num Unique Raters: 59422 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Prepare ratings elapsed time: 6.57 secs (0.11 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFCoreScorer Helpfulness scores post-harassment elapsed time: 18.44 secs (0.31 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.10838886350393295 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08207825571298599 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.085419 | time=52.1s | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFGroupScorer_14: fcc4c81174deeb13513bc6f2c7e015dadcb3a76a93f0b0ffce89df50628ddde0 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFGroupScorer_14: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFGroupScorer_14: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:Num epochs: 140 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.10838139802217484 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08207494020462036 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16098268330097198 | |
INFO:birdwatch.scorer:MFGroupScorer_1 First MF/stable init elapsed time: 85.93 secs (1.43 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.083797 | time=59.4s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.44978126883506775 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33096587657928467 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 59422, Notes: 393602 | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 26.181495520856092 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 173.4221163878698 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.5232287049293518 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.5227042436599731 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.082702 | time=67.0s | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.081970 | time=74.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.5084, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.081950 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.919948 | time=7.2s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.20506006479263306 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.15459944307804108 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.909012 | time=14.3s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4497806131839752 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33096542954444885 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.908116 | time=21.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=1.908076 | time=25.2s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.438201 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.355866 | time=4.3s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.94 secs (0.58 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Compute scored notes elapsed time: 46.55 secs (0.78 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.354857 | time=8.5s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.15487438440322876 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11901433020830154 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.354816 | time=10.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.6690, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0820, 1.9081, 0.3548 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Low Diligence MF elapsed time: 114.13 secs (1.90 mins) | |
INFO:birdwatch.note_ratings:Total ratings: 5866137 post-tombstones and 1304 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4499949, including 4499948 post-tombstones and 1 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 391240 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Compute valid ratings elapsed time: 7.56 secs (0.13 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Helpfulness scores pre-harassment elapsed time: 0.60 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 54292 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 94597 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 44063 | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 39628 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5200616 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4149971 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Filtering by helpfulness score elapsed time: 7.03 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.59 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2659054 | |
1 166306 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1324611 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2679514, Num Unique Notes Rated: 91114, Num Unique Raters: 37533 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2530698 | |
1 148816 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05553842973016748 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 17.00555047844318 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 37533, Notes: 91114 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 29.40836753956582 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 71.39088268989956 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.42610502243042 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5307347774505615 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6929018497467041 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.38166314363479614 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1264226734638214 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09259729832410812 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.46422964334487915 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3077114522457123 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.44909703731536865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3306475877761841 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.430682510137558 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29592713713645935 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.42618265748023987 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.29355210065841675 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 590255 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 574793 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 408324 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11993777751922607 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08940132707357407 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.45 secs (0.56 mins) | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4254646897315979 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2931756377220154 | |
INFO:birdwatch.matrix_factorization:Num epochs: 118 | |
INFO:birdwatch.matrix_factorization:epoch 118 0.42537814378738403 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2931492328643799 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.23184406757354736 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Harassment tag consensus elapsed time: 40.62 secs (0.68 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_1 Helpfulness scores post-harassment elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.constants:MFGroupScorer_3: Compute tag thresholds for percentiles elapsed time: 7.74 secs (0.13 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 54292 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 94597 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 40474 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 36039 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5200616 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3215679 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFTopicScorer_Unassigned run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 36039, Notes: 141704 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 22.692930333653248 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 89.22775326729376 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.401736319065094 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3370682895183563 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.4490904211997986 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3306436836719513 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11827641725540161 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08913040161132812 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11152531951665878 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08081214874982834 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 377369 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 100691291 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 52234089 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10744062811136246 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08087494224309921 | |
INFO:birdwatch.run_scoring:MFTopicScorer_Unassigned run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_Unassigned run_scorer_parallelizable: Loading data elapsed time: 19.34 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_Unassigned set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_Unassigned. Original rating length: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10690522193908691 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08003326505422592 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11769574880599976 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08914262801408768 | |
INFO:birdwatch.scorer: Ratings after topic filter: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 0 | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scorer.py, in prescore, at line 286: pd.DataFrame(columns=self.get_internal_scored_notes_cols()), | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
INFO:birdwatch.scorer:MFTopicScorer_Unassigned Filter input elapsed time: 10.55 secs (0.18 mins) | |
INFO:birdwatch.run_scoring:MFTopicScorer_UkraineConflict run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10672151297330856 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07912090420722961 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10670797526836395 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0791497528553009 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10670797526836395 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0791497528553009 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1642342060804367 | |
INFO:birdwatch.constants:Final round MF elapsed time: 48.14 secs (0.80 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_1 prescoring, about to call diligence with 3215679 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0000D09E403B665ADB698D8DF843CB22F352EF89ABF7CB... -0.516906 | |
1 00039991A9322D52F83399BC5B951F43B2A73869C21F10... -0.597522 | |
2 000402D0CF8FEC70E5C4BA76322215AE1A965BBE8A7568... 0.259197 | |
3 0004DC6827440EF91C141691934452677C533B6CA90AC4... -0.281706 | |
4 000618D62D26469C059F4690178D06CB5483B122126D32... 0.556171 | |
... ... ... | |
36034 FFF1B7F5E3903007BC3D5724DA6C406F78DEE26BE8456C... 0.472987 | |
36035 FFF48D8AD66904B961AF600709250FD2CB54004147EB44... -0.191947 | |
36036 FFF8367EF46CACBB9D7C020C910B12A206DAC9BA5E05A9... -0.554009 | |
36037 FFF9D85CEB466E2694589895B9D234CD48219AC8D3ADC4... -0.324200 | |
36038 FFFDEAD3B6BBA58927423C9C907473FD24FFEEACB4396E... 0.085110 | |
[36039 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 36039, vs. num we are initializing: 36039 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 36039 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.535381 | time=0.0s | |
INFO:birdwatch.run_scoring:MFTopicScorer_UkraineConflict run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_UkraineConflict run_scorer_parallelizable: Loading data elapsed time: 19.64 secs (0.33 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_UkraineConflict set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.746414 | time=7.3s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1173965111374855 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08920645713806152 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.44898372888565063 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33060288429260254 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_UkraineConflict. Original rating length: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.315688 | time=14.5s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377369, Notes: 1203828 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.254010 | time=21.9s | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 43.389993420987054 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 138.41648095100552 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.377983421087265 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31783124804496765 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.238822 | time=29.1s | |
INFO:birdwatch.matrix_factorization:Num epochs: 118 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.11718665808439255 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08927183598279953 | |
INFO:birdwatch.matrix_factorization:epoch 118 0.44898954033851624 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3306144177913666 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20237286388874054 | |
INFO:birdwatch.scorer:MFExpansionScorer Harassment tag consensus elapsed time: 614.20 secs (10.24 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.233021 | time=36.4s | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:Num epochs: 128 | |
INFO:birdwatch.matrix_factorization:epoch 128 0.44897085428237915 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.33059483766555786 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.20240214467048645 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Harassment tag consensus elapsed time: 647.35 secs (10.79 mins) | |
INFO:birdwatch.scorer: Ratings after topic filter: 4092086 | |
INFO:birdwatch.scorer: Ratings after group filter: 4092086 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Filter input elapsed time: 35.35 secs (0.59 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.230194 | time=43.6s | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_UkraineConflict: a07339ba6a72dc2c4115bdfd8c4ce154062a7bfadc759904f0ff6270aab95de4 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 3208653, Num Unique Notes Rated: 36218, Num Unique Raters: 77558 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Prepare ratings elapsed time: 2.04 secs (0.03 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.228630 | time=50.6s | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_UkraineConflict: 4e6394198c1760cfe6801faecda9539cf9dea7451ca93730b768dda4dc90c56b | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_UkraineConflict: af3dc223c4a793d52240ecd18d39a5b43e2a4c289b5170b5b24c062c2d70098a | |
INFO:birdwatch.matrix_factorization:epoch 180 0.11701510846614838 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0893414095044136 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_UkraineConflict: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 77558, Notes: 36218 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 88.59277155005799 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 41.371012661492045 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.66077995300293 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.194893836975098 | |
INFO:birdwatch.scorer:MFExpansionScorer Helpfulness scores post-harassment elapsed time: 21.76 secs (0.36 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.227660 | time=57.7s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.32391706109046936 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2694520056247711 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Helpfulness scores post-harassment elapsed time: 22.63 secs (0.38 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.226992 | time=64.7s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.12024037539958954 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08416181802749634 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09186814725399017 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06257833540439606 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.226514 | time=71.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0268, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.226501 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 200 0.11687801778316498 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0894116759300232 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.08778908848762512 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.060036398470401764 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.036354 | time=6.9s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0873025581240654 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05966803804039955 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.025881 | time=13.8s | |
INFO:birdwatch.matrix_factorization:epoch 120 0.08723433315753937 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05969322472810745 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.025078 | time=20.7s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11137835681438446 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08220585435628891 | |
INFO:birdwatch.matrix_factorization:epoch 220 0.11676009744405746 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0894821360707283 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.025053 | time=23.1s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.396995 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.08722575753927231 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05966167896986008 | |
INFO:birdwatch.matrix_factorization:Num epochs: 143 | |
INFO:birdwatch.matrix_factorization:epoch 143 0.0872255191206932 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0596679225564003 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.15083946287631989 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict First MF/stable init elapsed time: 44.76 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_UkraineConflict | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_UkraineConflict prescoring, about to call diligence with 3208653 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.315150 | time=4.0s | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 056B1936908F42285AC8A4E4CD928C9BC3DAD8547FEE39... -0.729118 | |
1 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... 0.156897 | |
2 67B54620C2319FCDE70894F7B1D89C882952908664A35D... -0.815295 | |
3 E23374E04DD1B97ED5E4BE68F56CD25AE5DE53DD2A3541... -0.219230 | |
4 E462D40CC316ED0864D77A36DA000DA98A8A6F61C204DE... -0.774939 | |
... ... ... | |
77553 FE9609DDEF180E906BB41137EB796FA99A971692115E38... 0.497978 | |
77554 2106072920573EAC8033CACA80917F1E31B046A64BF772... -0.562641 | |
77555 7B4C291871CF7E6FBAEE1C0F3DCBC6978FD32D56EA227C... -0.754790 | |
77556 E8E0085E9629E94B7F3D1757968E2E51A773585C7F0BD0... -0.094775 | |
77557 25B6B88DAE2AD3680710497AF96A0FD9B23C4997711E47... -0.405464 | |
[77558 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 77558, vs. num we are initializing: 77558 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 77558 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=20.426998 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.314164 | time=8.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.314122 | time=10.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.2284, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.2265, 2.0251, 0.3141 | |
INFO:birdwatch.scorer:MFGroupScorer_1 Low Diligence MF elapsed time: 108.57 secs (1.81 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.756299 | time=7.8s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.262148 | time=15.5s | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 240 0.11665724217891693 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08955366909503937 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.197178 | time=23.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.183473 | time=30.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.178978 | time=38.5s | |
INFO:birdwatch.matrix_factorization:epoch 260 0.11656199395656586 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08962821215391159 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.177069 | time=46.1s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.59 secs (0.56 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.176133 | time=53.8s | |
INFO:birdwatch.constants:MFGroupScorer_1: Compute tag thresholds for percentiles elapsed time: 7.10 secs (0.12 mins) | |
INFO:birdwatch.matrix_factorization:epoch 280 0.1164797842502594 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08969981968402863 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.175637 | time=61.5s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10973301529884338 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08385898917913437 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFTopicScorer_GazaConflict run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.175357 | time=69.6s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 747974 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712504 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 542045 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.175201 | time=77.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.3211, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.175198 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 300 0.11640468239784241 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08977333456277847 | |
INFO:birdwatch.run_scoring:MFTopicScorer_GazaConflict run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_GazaConflict run_scorer_parallelizable: Loading data elapsed time: 18.92 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_GazaConflict set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.833054 | time=7.5s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 747994 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712511 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 542394 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_GazaConflict. Original rating length: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.825998 | time=14.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.824899 | time=22.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.824899 | time=22.3s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.292622 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 320 0.11634308099746704 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08984430134296417 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.203768 | time=4.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.202722 | time=8.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.202677 | time=10.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(3.1824, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.1752, 1.8249, 0.2027 | |
INFO:birdwatch.scorer:MFTopicScorer_UkraineConflict Low Diligence MF elapsed time: 114.47 secs (1.91 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 489451 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 116567689 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 61497058 | |
INFO:birdwatch.constants:MFTopicScorer_UkraineConflict: Compute tag thresholds for percentiles elapsed time: 5.33 secs (0.09 mins) | |
INFO:birdwatch.scorer: Ratings after topic filter: 12081809 | |
INFO:birdwatch.scorer: Ratings after group filter: 12081809 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Filter input elapsed time: 37.81 secs (0.63 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 489800 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 116569833 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 61623294 | |
INFO:birdwatch.run_scoring:MFTopicScorer_MessiRonaldo run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 340 0.11628181487321854 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08991576731204987 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10881572216749191 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08203759789466858 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_GazaConflict: 723082fc06a9c0e97530557ad8a8003fcc3f188f1766118f3f407514620ef7b2 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 10946843, Num Unique Notes Rated: 96842, Num Unique Raters: 161638 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Prepare ratings elapsed time: 5.86 secs (0.10 mins) | |
INFO:birdwatch.run_scoring:MFTopicScorer_MessiRonaldo run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFTopicScorer_MessiRonaldo run_scorer_parallelizable: Loading data elapsed time: 18.77 secs (0.31 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFTopicScorer_MessiRonaldo set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 360 0.11623252928256989 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08997941762208939 | |
INFO:birdwatch.scorer:Filtering ratings for MFTopicScorer_MessiRonaldo. Original rating length: 118317340 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_GazaConflict: fb2b6b467c852162acc41f9b86605716ca1bf82801cd8e72658a399bb84af0c0 | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_GazaConflict: 941d65797968844829bb0758ed7975894ac49da4cf4d6e543ea0b048d5571a66 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_GazaConflict: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 161638, Notes: 96842 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 113.03817558497346 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 67.72443979757236 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.699262619018555 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.248696804046631 | |
INFO:birdwatch.matrix_factorization:epoch 380 0.11618795990943909 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0900399312376976 | |
INFO:birdwatch.scorer: Ratings after topic filter: 199000 | |
INFO:birdwatch.scorer: Ratings after group filter: 199000 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 489451, Notes: 1295054 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.486095560494 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.64497365415536 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Filter input elapsed time: 30.09 secs (0.50 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFTopicScorer_MessiRonaldo: 17e195233e4335626a97343c1889685dd70059054629339200f9d331ea1bd254 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 62067, Num Unique Notes Rated: 2402, Num Unique Raters: 2779 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Prepare ratings elapsed time: 0.15 secs (0.00 mins) | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFTopicScorer_MessiRonaldo: a217d6c62c7d55fb45a1423ad1f7ad48919a7589b62809408d90532b0fc53eae | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFTopicScorer_MessiRonaldo: 8f15330e9e4bd02a2dcedfaaa02394dcbd9eb4e266eea0ca92faefd1b4025517 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFTopicScorer_MessiRonaldo: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2779, Notes: 2402 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 25.83971690258118 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 22.334292911119107 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.959341049194336 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 6.453376770019531 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.35005903244018555 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2881944179534912 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11375442892313004 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07303179055452347 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.08148330450057983 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04638390243053436 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.07666441798210144 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0425170361995697 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.076076939702034 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04200398176908493 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.07599727809429169 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.041975054889917374 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.07598765194416046 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.041963379830121994 | |
INFO:birdwatch.matrix_factorization:Num epochs: 144 | |
INFO:birdwatch.matrix_factorization:epoch 144 0.07598724961280823 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.04195939376950264 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.167100727558136 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo First MF/stable init elapsed time: 1.46 secs (0.02 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_MessiRonaldo | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_MessiRonaldo prescoring, about to call diligence with 62067 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 90D27164DF30535EDB518FAD15DEE8728388F8CA14C75E... -0.772900 | |
1 4E44139DB610989839A095579EBA2EF46825BF25E13FFD... 0.231995 | |
2 2E31629F722BF87A215706A6311E21E123A4624B4D10E2... -0.758551 | |
3 6745B794E9C46A45ABF33E250B5053EC684C28F888355F... -0.318866 | |
4 5C923A1ACF69C684AFABDF63F42734BDCC5FE2B8E3611A... -0.628704 | |
... ... ... | |
2774 36ABBE1781AD22B5AC38F261D170BF8ADEE815FE60A143... 0.172658 | |
2775 3F13441FD8CF3294E04E38355EC53FAE3FE8C68CE6C8F7... 0.431192 | |
2776 C579A175CA58969A611D429805EC38759B99F3378627BC... 0.187552 | |
2777 6406A84AB54616A3BFF054E5D78B32D8836721FDCD72B8... -0.586839 | |
2778 2A85A2042F4E35DAF08071EDAF3B34614F6FD4726CC9BB... -0.472816 | |
[2779 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2779, vs. num we are initializing: 2779 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2779 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=18.600965 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.327718 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.787355 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.691185 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.666158 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.656056 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.650549 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.647326 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=2.645029 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=2.643422 | time=2.2s | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3724973797798157 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3125559687614441 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=2.642532 | time=2.5s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.1515, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=2.642508 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.323409 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.317610 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.316520 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.316520 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.409433 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.305268 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.304056 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.304001 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8809, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 2.6425, 1.3165, 0.3040 | |
INFO:birdwatch.scorer:MFTopicScorer_MessiRonaldo Low Diligence MF elapsed time: 3.67 secs (0.06 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 0.08 secs (0.00 mins) | |
INFO:birdwatch.constants:MFTopicScorer_MessiRonaldo: Compute tag thresholds for percentiles elapsed time: 0.24 secs (0.00 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:MFMultiGroupScorer_1 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3809957504272461 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32407498359680176 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 489800, Notes: 1295080 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.582615745745436 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.81317680685994 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3725482225418091 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3126033842563629 | |
INFO:birdwatch.matrix_factorization:epoch 400 0.11614863574504852 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0901012197136879 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10870363563299179 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08158905804157257 | |
INFO:birdwatch.run_scoring:MFMultiGroupScorer_1 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFMultiGroupScorer_1 run_scorer_parallelizable: Loading data elapsed time: 19.29 secs (0.32 mins) | |
INFO:birdwatch.scorer:prescore: Torch intra-op parallelism for MFMultiGroupScorer_1 set to: 4 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11090180277824402 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0778173878788948 | |
INFO:birdwatch.scorer:Filtering ratings for MFMultiGroupScorer_1. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:Num epochs: 416 | |
INFO:birdwatch.matrix_factorization:epoch 416 0.11612030118703842 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09014349430799484 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.08644558489322662 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05855761840939522 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 59422, Notes: 393602 | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12109839171171188 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.5229554772377014 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.08238106966018677 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05536932498216629 | |
INFO:birdwatch.scorer: Ratings after group filter: 6123382 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Filter input elapsed time: 45.11 secs (0.75 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.mf_base_scorer:ratings summary MFMultiGroupScorer_1: 20d18fde99178b622769a363d9f004f086b191c68cf48a19beee4150ad1d0426 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 5276919, Num Unique Notes Rated: 222841, Num Unique Raters: 54464 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Prepare ratings elapsed time: 2.64 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11209049820899963 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08320221304893494 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.06309746205806732 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1838725507259369 | |
INFO:birdwatch.mf_base_scorer:ratingsForTraining summary MFMultiGroupScorer_1: 11517728fed2ccc5571312d84787362f4d263b57d1b607df03a32c9c8595f1de | |
INFO:birdwatch.mf_base_scorer:noteStatusHistory summary MFMultiGroupScorer_1: bab26056c2c51b2beff057f4e0fe86fa071404051fc10055597e708318a26b55 | |
INFO:birdwatch.mf_base_scorer:userEnrollmentRaw summary MFMultiGroupScorer_1: 1b2a6ad46dcfd66350ebeb2e58eebae04708074b0cb986ec945580276c124614 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.08187726885080338 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05496150255203247 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 54464, Notes: 222841 | |
INFO:birdwatch.matrix_factorization:learning rate set to :1.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 23.680197988700463 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 96.88820138072856 | |
INFO:birdwatch.matrix_factorization:epoch 0 6.21956205368042 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 5.767576694488525 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11210079491138458 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0832115039229393 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10869144648313522 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08150900155305862 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10869144648313522 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08150900155305862 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1698254644870758 | |
INFO:birdwatch.constants:Final round MF elapsed time: 535.63 secs (8.93 mins) | |
INFO:birdwatch.mf_base_scorer:In MFCoreScorer prescoring, about to call diligence with 52234089 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 20 0.3477414846420288 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2775827944278717 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.05974303185939789 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1947803646326065 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.08180892467498779 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.054906755685806274 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.14639396965503693 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.112333282828331 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.12049440294504166 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09189274907112122 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.053542256355285645 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.15519458055496216 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.08180093765258789 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.054893944412469864 | |
INFO:birdwatch.matrix_factorization:Num epochs: 146 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.11436828970909119 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08636156469583511 | |
INFO:birdwatch.matrix_factorization:epoch 146 0.08180038630962372 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05489334836602211 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.14470213651657104 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict First MF/stable init elapsed time: 157.05 secs (2.62 mins) | |
INFO:birdwatch.mf_base_scorer:Skipping rep-filtering in prescoring for MFTopicScorer_GazaConflict | |
/home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py:573: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
helpfulnessScores[ | |
INFO:birdwatch.mf_base_scorer:In MFTopicScorer_GazaConflict prescoring, about to call diligence with 10946843 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 F35972BBD2F99515FD974E9C7AFD899970F2E4A5911513... 0.703563 | |
1 9E93A0C21A1CD3DD7C3A772E71A2DD0B6E79103B020A32... 0.786065 | |
2 EF12C150CE8A147E0804CFBEA80649018A15435E54C4E5... 0.819809 | |
3 EBDCB80B1EC4A9FB51C8A562377D72F9569692DEFFC8BC... 0.645086 | |
4 70B62959F72CA22F3697BD4E5674B3990AD91893FD9320... 0.775824 | |
... ... ... | |
161633 7B4C291871CF7E6FBAEE1C0F3DCBC6978FD32D56EA227C... 0.551154 | |
161634 CCEEA2235CEE1B03C011F3EF3ECF769A3D9A9D8CE623D4... -0.314116 | |
161635 C8C92EF65FB156E1A09F1C7468D0BB036225DE3927E945... -0.542172 | |
161636 7C60F353091E8F57A620BC71CF1B2A8C810EA76EC08066... -0.466760 | |
161637 544C40FD5CB0A723BA61EAEB9EF5EE1BE14D558CDBA69D... -0.610971 | |
[161638 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 161638, vs. num we are initializing: 161638 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 161638 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=24.686903 | time=0.2s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11363682150840759 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08582276105880737 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.04737416282296181 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12153952568769455 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.175260 | |
1 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.694234 | |
2 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.286660 | |
3 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.431962 | |
4 0000AE9A69E1B5D132C053E253DC42A007EDE2F11C39CF... 0.417850 | |
... ... ... | |
377364 FFFFA008A90B7144EF2CC117355D4B4743C471CA9B2DCA... 0.493541 | |
377365 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.023487 | |
377366 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.200495 | |
377367 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.186171 | |
377368 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.525469 | |
[377369 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 377369, vs. num we are initializing: 377369 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 377369 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11355504393577576 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08573752641677856 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.843794 | time=0.5s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11087857186794281 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08510537445545197 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11088366061449051 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0851113572716713 | |
INFO:birdwatch.matrix_factorization:epoch 140 0.1135435625910759 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0857313796877861 | |
INFO:birdwatch.matrix_factorization:Num epochs: 148 | |
INFO:birdwatch.matrix_factorization:epoch 148 0.11354270577430725 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08573180437088013 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17076964676380157 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 First MF/stable init elapsed time: 86.26 secs (1.44 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFMultiGroupScorer_1 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.989847 | time=29.0s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.0452355295419693 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11535946279764175 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:epoch 120 0.044440120458602905 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11451677232980728 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.392782 | time=58.3s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.27 secs (0.57 mins) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Compute scored notes elapsed time: 46.63 secs (0.78 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 140 0.04405667260289192 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11405300348997116 | |
INFO:birdwatch.note_ratings:Total ratings: 6122270 post-tombstones and 1112 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 4838988, including 4838981 post-tombstones and 7 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 559088 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Compute valid ratings elapsed time: 7.10 secs (0.12 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Helpfulness scores pre-harassment elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 54464 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 125134 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 49534 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.305246 | time=87.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.703397 | time=76.2s | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 43574 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5276919 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 4382189 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Filtering by helpfulness score elapsed time: 7.10 secs (0.12 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2798897 | |
1 252529 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 1330763 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 2814502, Num Unique Notes Rated: 143216, Num Unique Raters: 40533 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 2600687 | |
1 213815 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.07596903466403648 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 12.163257956644763 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 40533, Notes: 143216 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.652147804714556 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 69.43729800409542 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.4466817378997803 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.5445270538330078 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10976183414459229 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0828113704919815 | |
INFO:birdwatch.matrix_factorization:epoch 160 0.04380316287279129 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11368976533412933 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6939966082572937 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.40161973237991333 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.1097688376903534 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08282257616519928 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.49634698033332825 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3338276147842407 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4679514765739441 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32417595386505127 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.286071 | time=116.5s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.46407079696655273 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32240691781044006 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.46353405714035034 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.32215869426727295 | |
INFO:birdwatch.matrix_factorization:epoch 180 0.043628476560115814 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1134783998131752 | |
INFO:birdwatch.matrix_factorization:Num epochs: 112 | |
INFO:birdwatch.matrix_factorization:epoch 112 0.4634898900985718 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.322145015001297 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.23346905410289764 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Harassment tag consensus elapsed time: 38.94 secs (0.65 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Helpfulness scores post-harassment elapsed time: 1.20 secs (0.02 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 54464 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 125134 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 45043 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 39083 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 5276919 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3500898 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 39083, Notes: 222667 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.722572271598397 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 89.57597932604969 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.3783248960971832 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.31209078431129456 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.279547 | time=145.2s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11124187707901001 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08050009608268738 | |
INFO:birdwatch.matrix_factorization:epoch 200 0.04350543022155762 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11330803483724594 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.1095086932182312 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08193044364452362 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10819301754236221 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07915127277374268 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.364295 | time=151.6s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10814684629440308 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07923396676778793 | |
INFO:birdwatch.matrix_factorization:epoch 220 0.04339803010225296 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11313688009977341 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.276680 | time=174.2s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.1081295982003212 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07899565994739532 | |
INFO:birdwatch.matrix_factorization:Num epochs: 103 | |
INFO:birdwatch.matrix_factorization:epoch 103 0.10812994092702866 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0791429802775383 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1732679158449173 | |
INFO:birdwatch.constants:Final round MF elapsed time: 49.07 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:In MFMultiGroupScorer_1 prescoring, about to call diligence with 3500898 final round ratings. | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10968925058841705 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08260420709848404 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.10968227684497833 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0825948491692543 | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.455396 | |
1 00018D8DDD8FE5AD262631A9CA08190AB95942067312FD... -0.096143 | |
2 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... 0.010154 | |
3 0003B87251FE6860759A856C73472561F9A37C4813053E... -0.340809 | |
4 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... 0.186204 | |
... ... ... | |
39078 FFEF7AD019F0E1EE28157E1298D5469164E8D7AF2CA91D... -0.114137 | |
39079 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... 0.118961 | |
39080 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... 0.386905 | |
39081 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... 0.295953 | |
39082 FFFE8C4E72CFDBD164D87E0FDA30F8334EC8B6013F1238... 0.346068 | |
[39083 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 39083, vs. num we are initializing: 39083 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 39083 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=17.285879 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.650838 | time=8.0s | |
INFO:birdwatch.matrix_factorization:epoch 240 0.043300457298755646 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11295776069164276 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.230716 | time=15.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.275213 | time=203.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.163003 | time=23.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.143131 | time=31.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.134256 | time=39.8s | |
INFO:birdwatch.matrix_factorization:epoch 260 0.04321220517158508 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11280038207769394 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.129236 | time=47.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.274372 | time=232.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.126004 | time=55.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.321676 | time=230.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.123863 | time=63.6s | |
INFO:birdwatch.matrix_factorization:epoch 280 0.04312866926193237 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1126420646905899 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.122368 | time=71.6s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10967634618282318 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08246105164289474 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10967634618282318 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08246105164289474 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17131315171718597 | |
INFO:birdwatch.constants:Final round MF elapsed time: 615.45 secs (10.26 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionPlusScorer prescoring, about to call diligence with 61623294 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.121328 | time=79.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.273835 | time=261.9s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.7216, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.121299 | time=0.0s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.10966924577951431 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08245106041431427 | |
INFO:birdwatch.matrix_factorization:Num epochs: 101 | |
INFO:birdwatch.matrix_factorization:epoch 101 0.10966924577951431 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08245106041431427 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17132236063480377 | |
INFO:birdwatch.constants:Final round MF elapsed time: 634.08 secs (10.57 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionScorer prescoring, about to call diligence with 61497058 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.996462 | time=7.7s | |
INFO:birdwatch.matrix_factorization:epoch 300 0.04304852336645126 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11248570680618286 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.985533 | time=15.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.984703 | time=23.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.984676 | time=25.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.477476 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.273465 | time=290.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.6498, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.273455 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.393201 | time=4.5s | |
INFO:birdwatch.matrix_factorization:epoch 320 0.04297034814953804 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1123419925570488 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.392171 | time=9.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.392128 | time=11.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.8757, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.1213, 1.9847, 0.3921 | |
INFO:birdwatch.scorer:MFMultiGroupScorer_1 Low Diligence MF elapsed time: 120.93 secs (2.02 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.312581 | time=303.4s | |
INFO:birdwatch.matrix_factorization:epoch 340 0.04289426654577255 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1121964380145073 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.904213 | time=28.4s | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.166164 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.445351 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.712713 | |
3 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.279733 | |
4 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.462355 | |
... ... ... | |
489795 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.043759 | |
489796 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.211379 | |
489797 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... 0.059706 | |
489798 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.215199 | |
489799 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.546871 | |
[489800 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489800, vs. num we are initializing: 489800 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 489800 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.matrix_factorization:epoch 360 0.04282209277153015 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11206253618001938 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.837796 | time=0.7s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 34.04 secs (0.57 mins) | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... -0.166075 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... -0.445450 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... 0.712960 | |
3 00005300B9017670433392BF6767238D54E058EC25D5C5... -0.277656 | |
4 00007B885907790E492F8C9A31F1AFC20831279328C263... 0.462468 | |
... ... ... | |
489446 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... 0.042596 | |
489447 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... 0.213774 | |
489448 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... 0.060616 | |
489449 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.215851 | |
489450 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.546730 | |
[489451 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489451, vs. num we are initializing: 489451 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 489451 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.898035 | time=56.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=15.951445 | time=0.7s | |
INFO:birdwatch.constants:MFMultiGroupScorer_1: Compute tag thresholds for percentiles elapsed time: 9.96 secs (0.17 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 380 0.042749322950839996 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11193254590034485 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.896855 | time=85.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.896819 | time=95.6s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.250449 | time=0.2s | |
INFO:birdwatch.matrix_factorization:epoch 400 0.04268262907862663 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11180592328310013 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.309645 | time=378.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.175483 | time=18.0s | |
INFO:birdwatch.matrix_factorization:epoch 420 0.042615145444869995 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11168880760669708 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.716438 | time=82.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.174674 | time=35.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.174644 | time=41.7s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(3.6028, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.2735, 1.8968, 0.1746 | |
INFO:birdwatch.scorer:MFTopicScorer_GazaConflict Low Diligence MF elapsed time: 439.58 secs (7.33 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 2.12 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:epoch 440 0.04255577549338341 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11159063130617142 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.718773 | time=94.9s | |
INFO:birdwatch.constants:MFTopicScorer_GazaConflict: Compute tag thresholds for percentiles elapsed time: 22.64 secs (0.38 mins) | |
INFO:birdwatch.matrix_factorization:epoch 460 0.042498402297496796 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1114979088306427 | |
INFO:birdwatch.matrix_factorization:Num epochs: 465 | |
INFO:birdwatch.matrix_factorization:epoch 465 0.042490653693675995 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11147822439670563 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.15220145881175995 | |
INFO:birdwatch.scorer:MFGroupScorer_14 First MF/stable init elapsed time: 1044.92 secs (17.42 mins) | |
INFO:birdwatch.mf_base_scorer:Performing rep-filtering for MFGroupScorer_14 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.308357 | time=459.3s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.382811 | time=164.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.383152 | time=175.4s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.73 secs (0.56 mins) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Compute scored notes elapsed time: 55.40 secs (0.92 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in get_ratings_before_note_status_and_public_tsv, at line 68: ratingsWithNoteLabelInfo = ratings[ | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.note_ratings:Total ratings: 11191003 post-tombstones and 4014 pre-tombstones | |
INFO:birdwatch.note_ratings:Total ratings created before statuses: 9098877, including 9098606 post-tombstones and 271 pre-tombstones. | |
INFO:birdwatch.note_ratings:Total valid ratings: 441893 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Compute valid ratings elapsed time: 14.34 secs (0.24 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Helpfulness scores pre-harassment elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.307673 | time=531.8s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 59422 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 135773 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 38591 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 37329 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 10305089 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 7696981 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Filtering by helpfulness score elapsed time: 13.75 secs (0.23 mins) | |
INFO:birdwatch.tag_consensus:-------------------Training for tag notHelpfulSpamHarassmentOrAbuse------------------- | |
INFO:birdwatch.tag_consensus:Pre-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 4600537 | |
1 266733 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 2829711 | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 4476956, Num Unique Notes Rated: 240923, Num Unique Raters: 35788 | |
INFO:birdwatch.tag_consensus:Post-filtering tag label breakdown notHelpfulSpamHarassmentOrAbuseLabel | |
0 4247007 | |
1 229949 | |
dtype: int64 | |
INFO:birdwatch.tag_consensus:Number of rows with no tag label 0 | |
INFO:birdwatch.tag_consensus:notHelpfulSpamHarassmentOrAbuse Positive Rate: 0.05136280097459077 | |
INFO:birdwatch.matrix_factorization:Using pos weight: 18.469343202188313 with BCEWithLogitsLoss | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 35788, Notes: 240923 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :2.0 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 18.58251806593808 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.09656868223986 | |
INFO:birdwatch.matrix_factorization:epoch 0 3.189854145050049 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 1.3274030685424805 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.6892223954200745 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.3509666919708252 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.4422963261604309 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.28034254908561707 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.341896 | time=244.5s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.4078294634819031 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2675808072090149 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.4029937982559204 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2657165825366974 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.342025 | time=255.8s | |
INFO:birdwatch.matrix_factorization:epoch 100 0.40232470631599426 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.26539942622184753 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.307265 | time=604.5s | |
INFO:birdwatch.matrix_factorization:Num epochs: 118 | |
INFO:birdwatch.matrix_factorization:epoch 118 0.4022292494773865 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.2653917074203491 | |
INFO:birdwatch.matrix_factorization:Global Intercept: -0.25583866238594055 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Harassment tag consensus elapsed time: 65.70 secs (1.09 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:31: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
scoredNotes.loc[:, c.noteCountKey] = 1 | |
JOIN ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 109: authorCounts.join( | |
PandasTypeError: Output mismatch on index: result=object expected=<class 'numpy.int64'> (allowed) | |
PandasTypeError: Output mismatch on crhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on crnhBool: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on noteCount: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingAgreesWithNoteStatus: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on ratingCount: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py, in compute_general_helpfulness_scores, at line 167: helpfulnessScores = helpfulnessScores.merge( | |
PandasTypeError: Output mismatch on totalHelpfulHarassmentPenalty: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:173: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
helpfulnessScores[c.totalHelpfulHarassmentRatingsPenaltyKey].fillna(0, inplace=True) | |
INFO:birdwatch.scorer:MFGroupScorer_14 Helpfulness scores post-harassment elapsed time: 2.08 secs (0.03 mins) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 59422 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 135773 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 35090 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 33828 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 10305089 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 5960805 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 33828, Notes: 391715 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.217198728667526 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 176.20920539198298 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.16280224919319153 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1430436670780182 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.12330132722854614 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09288666397333145 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.333297 | time=324.7s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.11992843449115753 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09041672945022583 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.11925151944160461 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08986031264066696 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.333314 | time=337.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.307006 | time=678.0s | |
INFO:birdwatch.matrix_factorization:epoch 80 0.1190737709403038 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08961638063192368 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.11900263279676437 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08947830647230148 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.11896945536136627 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08936286717653275 | |
INFO:birdwatch.matrix_factorization:Num epochs: 124 | |
INFO:birdwatch.matrix_factorization:epoch 124 0.11896771192550659 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08934778720140457 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 33828, Notes: 391715 | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.02 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:epoch 0 0.08717896789312363 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.30558687448501587 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.330579 | time=405.0s | |
INFO:birdwatch.matrix_factorization:epoch 20 0.05134270712733269 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12314626574516296 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.04723444581031799 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11297580599784851 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.306842 | time=751.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.0968, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.306838 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.330473 | time=417.9s | |
INFO:birdwatch.matrix_factorization:epoch 60 0.04607836902141571 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11370635032653809 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.04567534849047661 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11388987302780151 | |
INFO:birdwatch.matrix_factorization:epoch 100 0.04547697305679321 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11417226493358612 | |
INFO:birdwatch.matrix_factorization:epoch 120 0.045362748205661774 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11452613025903702 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.329425 | time=486.1s | |
INFO:birdwatch.matrix_factorization:epoch 140 0.045282356441020966 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11480464786291122 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.069114 | time=70.5s | |
INFO:birdwatch.matrix_factorization:epoch 160 0.04522169381380081 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1150297150015831 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.329241 | time=498.9s | |
INFO:birdwatch.matrix_factorization:epoch 180 0.04516759514808655 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11521623283624649 | |
INFO:birdwatch.matrix_factorization:Num epochs: 182 | |
INFO:birdwatch.matrix_factorization:epoch 182 0.045167598873376846 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.11522604525089264 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16320422291755676 | |
INFO:birdwatch.constants:Final round MF elapsed time: 234.84 secs (3.91 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_14 prescoring, about to call diligence with 5960805 final round ratings. | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
None, | |
raterInitState: | |
raterParticipantId internalRaterFactor1 | |
0 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... 0.091846 | |
1 00032CF270BEF4007D6B24E33135CD078C72B0965FCD8D... -0.855938 | |
2 00054DA8CA53842EE3042D2E203830D7F023E91EC47259... -0.675938 | |
3 000818E860FC3D0209D9E2493FC76B78311313A011891F... -0.430495 | |
4 000AC81189581CADFDC18CD0617507240DAB2F2CD05AAC... -0.339038 | |
... ... ... | |
33823 FFFD9C3BC7BB3A78D72C67E34A7BDEFAAFFC485AAE049D... 0.285925 | |
33824 FFFDDE9AE1DFCB76019D1A523D5CC586BB1AB22B878801... 0.372995 | |
33825 FFFF4DD649728988010BBC2B953A59797EA70028B58EA8... -0.670635 | |
33826 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... -0.212466 | |
33827 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... 0.653602 | |
[33828 rows x 2 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 33828, vs. num we are initializing: 33828 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 33828 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalRaterReputation | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteFactor1 | |
INFO:birdwatch.reputation_matrix_factorization:Not initializing internalNoteIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Reputation Matrix Factorization: rater reputation frozen | |
INFO:birdwatch.reputation_matrix_factorization:Round 1: | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=16.812336 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=3.595252 | time=14.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=3.191230 | time=29.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.058878 | time=140.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=3.124678 | time=43.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.328831 | time=568.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=3.104377 | time=58.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.328612 | time=579.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=3.095136 | time=72.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=3.089811 | time=86.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=3.086323 | time=101.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.058023 | time=210.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.058023 | time=210.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.339636 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.083937 | time=115.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.328487 | time=650.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.082276 | time=130.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.081114 | time=144.6s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(0.4415, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.081080 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=240 | loss=3.328250 | time=660.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=1.925755 | time=13.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.254762 | time=48.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=1.914432 | time=27.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=1.913625 | time=41.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=1.913596 | time=46.0s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.439497 | time=0.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.368043 | time=8.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.367151 | time=16.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.328269 | time=731.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.253774 | time=96.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.367114 | time=21.1s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(1.5775, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.0811, 1.9136, 0.3671 | |
INFO:birdwatch.scorer:MFGroupScorer_14 Low Diligence MF elapsed time: 219.54 secs (3.66 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=270 | loss=3.328033 | time=742.0s | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.253734 | time=119.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.5740, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3068, 2.0580, 0.2537 | |
INFO:birdwatch.scorer:MFCoreScorer Low Diligence MF elapsed time: 1148.13 secs (19.14 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.25 secs (0.55 mins) | |
INFO:birdwatch.constants:MFGroupScorer_14: Compute tag thresholds for percentiles elapsed time: 16.01 secs (0.27 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.328127 | time=812.3s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.1629, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.328123 | time=0.6s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=300 | loss=3.327889 | time=820.8s | |
INFO:birdwatch.reputation_matrix_factorization:After round 1, global bias: Parameter containing: | |
tensor(1.1639, requires_grad=True) | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 2: learn rater rep (and everything else), freeze note intercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.327885 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.088579 | time=76.1s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.65 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.05 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.70 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.088306 | time=75.9s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 33.90 secs (0.57 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.078291 | time=150.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.078082 | time=152.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.077414 | time=223.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.077211 | time=228.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.077211 | time=228.5s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.336494 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=100 | loss=2.077385 | time=247.7s | |
INFO:birdwatch.reputation_matrix_factorization: | |
Round 3: fit intercepts and global intercept with everything else frozen | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.336255 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.249541 | time=51.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.249273 | time=50.4s | |
INFO:birdwatch.constants:MFCoreScorer: Compute tag thresholds for percentiles elapsed time: 205.08 secs (3.42 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.248245 | time=100.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.248514 | time=102.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.248205 | time=124.2s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.6551, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3281, 2.0774, 0.2482 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Low Diligence MF elapsed time: 1268.65 secs (21.14 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.248473 | time=128.4s | |
INFO:birdwatch.reputation_matrix_factorization:After round 3, global bias: Parameter containing: | |
tensor(2.6519, requires_grad=True) | |
INFO:birdwatch.diligence_model:Low diligence training loss: 3.3279, 2.0772, 0.2485 | |
INFO:birdwatch.scorer:MFExpansionScorer Low Diligence MF elapsed time: 1261.19 secs (21.02 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.57 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.53 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.57 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.00 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.52 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 30.42 secs (0.51 mins) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 30.69 secs (0.51 mins) | |
INFO:birdwatch.constants:MFExpansionScorer: Compute tag thresholds for percentiles elapsed time: 195.79 secs (3.26 mins) | |
INFO:birdwatch.constants:MFExpansionPlusScorer: Compute tag thresholds for percentiles elapsed time: 193.37 secs (3.22 mins) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in _prescore_notes_and_users, at line 887: raterModelOutput = raterParams.merge( | |
PandasTypeError: Output mismatch on totalRatingsMadeByRater: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.run_scoring:Got model results from all scorers. | |
INFO:birdwatch.run_scoring:---- | |
Completed individual scorers. Ran in parallel: True. Succeeded in 5344.91 seconds. | |
Individual scorers: (name, runtime): [('MFCoreScorer', '76.92 mins'), ('MFExpansionScorer', '83.77 mins'), ('MFExpansionPlusScorer', '83.83 mins'), ('ReputationScorer', '33.15 mins'), ('MFGroupScorer_13', '33.63 mins'), ('MFGroupScorer_12', '2.57 mins'), ('MFGroupScorer_11', '3.27 mins'), ('MFGroupScorer_10', '2.72 mins'), ('MFGroupScorer_9', '8.08 mins'), ('MFGroupScorer_8', '2.60 mins'), ('MFGroupScorer_7', '3.70 mins'), ('MFGroupScorer_6', '7.19 mins'), ('MFGroupScorer_5', '2.45 mins'), ('MFGroupScorer_4', '3.87 mins'), ('MFGroupScorer_3', '8.05 mins'), ('MFGroupScorer_2', '3.23 mins'), ('MFGroupScorer_1', '7.78 mins'), ('MFGroupScorer_14', '30.27 mins'), ('MFTopicScorer_Unassigned', '0.18 mins'), ('MFTopicScorer_UkraineConflict', '3.62 mins'), ('MFTopicScorer_GazaConflict', '11.94 mins'), ('MFTopicScorer_MessiRonaldo', '0.62 mins'), ('MFMultiGroupScorer_1', '8.05 mins')] | |
---- | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in combine_prescorer_scorer_results, at line 484: prescoringNoteModelOutput = pd.concat( | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
PandasTypeError: DataFrame concat on noteId: output=object inputs=[dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('O'), dtype('int64'), dtype('int64'), dtype('int64'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalNoteIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on internalNoteFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceNoteIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceNoteFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: Type expectation mismatch on noteId: found=object expected=int64 | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
/home/ubuntu/communitynotes/sourcecode/scoring/pandas_utils.py:364: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. | |
result = self._origConcat(*args, **kwargs) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in combine_prescorer_scorer_results, at line 505: raterParamsUnfilteredMultiScorers = pd.concat( | |
PandasTypeError: DataFrame concat on internalRaterIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on crhCrnhRatioDifference: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on meanNoteScore: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on raterAgreeRatio: output=float64 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on aboveHelpfulnessThreshold: output=object inputs=[dtype('bool'), dtype('bool'), dtype('bool'), dtype('float64'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('bool'), dtype('O'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('bool')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterReputation: output=float32 inputs=[dtype('float64'), dtype('float64'), dtype('float64'), dtype('float32'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64'), dtype('float64')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterIntercept: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterFactor1: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on lowDiligenceRaterReputation: output=float32 inputs=[dtype('float32'), dtype('float32'), dtype('float32'), dtype('float64'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('O'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')] (allowed) | |
PandasTypeError: DataFrame concat on incorrectTagRatingsMadeByRater: output=Int64 inputs=[Int64Dtype(), Int64Dtype(), Int64Dtype(), dtype('float64'), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int64Dtype(), Int8Dtype(), Int64Dtype()] (allowed) | |
INFO:birdwatch.run_scoring:notes total RAM: 122768821 bytes (0.123 GB) | |
column dtype RAM | |
0 noteId int64 12432248 | |
1 noteAuthorParticipantId object 12432248 | |
2 createdAtMillis int64 12432248 | |
3 tweetId object 12432248 | |
4 classification object 12432248 | |
5 believable category 1554155 | |
6 harmful category 1554155 | |
7 validationDifficulty category 1554155 | |
8 misleadingOther Int8 3108062 | |
9 misleadingFactualError Int8 3108062 | |
10 misleadingManipulatedMedia Int8 3108062 | |
11 misleadingOutdatedInformation Int8 3108062 | |
12 misleadingMissingImportantContext Int8 3108062 | |
13 misleadingUnverifiedClaimAsFact Int8 3108062 | |
14 misleadingSatire Int8 3108062 | |
15 notMisleadingOther Int8 3108062 | |
16 notMisleadingFactuallyCorrect Int8 3108062 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3108062 | |
18 notMisleadingClearlySatire Int8 3108062 | |
19 notMisleadingPersonalOpinion Int8 3108062 | |
20 trustworthySources Int8 3108062 | |
21 summary object 12432248 | |
22 isMediaNote Int8 3108062 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13133224872 bytes (13.133 GB) | |
column dtype RAM | |
0 noteId int64 946538720 | |
1 raterParticipantId object 946538720 | |
2 createdAtMillis int64 946538720 | |
3 version Int8 236634680 | |
4 agree Int8 236634680 | |
5 disagree Int8 236634680 | |
6 helpful Int8 236634680 | |
7 notHelpful Int8 236634680 | |
8 helpfulnessLevel category 118317472 | |
9 helpfulOther Int8 236634680 | |
10 helpfulInformative Int8 236634680 | |
11 helpfulClear Int8 236634680 | |
12 helpfulEmpathetic Int8 236634680 | |
13 helpfulGoodSources Int8 236634680 | |
14 helpfulUniqueContext Int8 236634680 | |
15 helpfulAddressesClaim Int8 236634680 | |
16 helpfulImportantContext Int8 236634680 | |
17 helpfulUnbiasedLanguage Int8 236634680 | |
18 notHelpfulOther Int8 236634680 | |
19 notHelpfulIncorrect Int8 236634680 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 236634680 | |
21 notHelpfulOpinionSpeculationOrBias Int8 236634680 | |
22 notHelpfulMissingKeyPoints Int8 236634680 | |
23 notHelpfulOutdated Int8 236634680 | |
24 notHelpfulHardToUnderstand Int8 236634680 | |
25 notHelpfulArgumentativeOrBiased Int8 236634680 | |
26 notHelpfulOffTopic Int8 236634680 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 236634680 | |
28 notHelpfulIrrelevantSources Int8 236634680 | |
29 notHelpfulOpinionSpeculation Int8 236634680 | |
30 notHelpfulNoteNotNeeded Int8 236634680 | |
31 ratedOnTweetId int64 946538720 | |
32 helpfulNum float64 946538720 | |
33 postSelectionValue float64 946538720 | |
34 postSelectionValue_note_author float64 946538720 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 225817062 bytes (0.226 GB) | |
column dtype RAM | |
0 noteId int64 14004048 | |
1 noteAuthorParticipantId object 14004048 | |
2 createdAtMillis float64 14004048 | |
3 timestampMillisOfFirstNonNMRStatus float64 14004048 | |
4 firstNonNMRStatus category 1750630 | |
5 timestampMillisOfCurrentStatus float64 14004048 | |
6 currentStatus category 1750638 | |
7 timestampMillisOfLatestNonNMRStatus float64 14004048 | |
8 mostRecentNonNMRStatus category 1750630 | |
9 timestampMillisOfStatusLock float64 14004048 | |
10 lockedStatus category 1750638 | |
11 timestampMillisOfRetroLock float64 14004048 | |
12 currentCoreStatus category 1750638 | |
13 currentExpansionStatus category 1750638 | |
14 currentGroupStatus category 1750638 | |
15 currentDecidedBy category 1751254 | |
16 currentModelingGroup float64 14004048 | |
17 timestampMillisOfMostRecentStatusChange float64 14004048 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14004048 | |
19 currentMultiGroupStatus category 1750638 | |
20 currentModelingMultiGroup float64 14004048 | |
21 timestampMinuteOfFinalScoringOutput float64 14004048 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14004048 | |
23 classification object 14004048 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 59362560 bytes (0.059 GB) | |
column dtype RAM | |
0 participantId object 8331584 | |
1 enrollmentState object 8331584 | |
2 successfulRatingNeededToEarnIn int64 8331584 | |
3 timestampOfLastStateChange int64 8331584 | |
4 timestampOfLastEarnOut float64 8331584 | |
5 modelingPopulation category 1041472 | |
6 modelingGroup float64 8331584 | |
7 numberOfTimesEarnedOut int64 8331584 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 288245304 bytes (0.288 GB) | |
column dtype RAM | |
0 noteId object 64054512 | |
1 internalNoteIntercept float32 32027256 | |
2 internalNoteFactor1 float32 32027256 | |
3 scorerName object 64054512 | |
4 lowDiligenceNoteIntercept float32 32027256 | |
5 lowDiligenceNoteFactor1 float32 32027256 | |
6 lowDiligenceNoteInterceptRound2 float32 32027256 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 401387130 bytes (0.401 GB) | |
column dtype RAM | |
0 raterParticipantId object 31793040 | |
1 internalRaterIntercept float32 15896520 | |
2 internalRaterFactor1 float32 15896520 | |
3 crhCrnhRatioDifference float64 31793040 | |
4 meanNoteScore float64 31793040 | |
5 raterAgreeRatio float64 31793040 | |
6 aboveHelpfulnessThreshold object 31793040 | |
7 scorerName object 31793040 | |
8 internalRaterReputation float32 15896520 | |
9 lowDiligenceRaterIntercept float32 15896520 | |
10 lowDiligenceRaterFactor1 float32 15896520 | |
11 lowDiligenceRaterReputation float32 15896520 | |
12 lowDiligenceRaterInterceptRound2 float32 15896520 | |
13 incorrectTagRatingsMadeByRater Int64 35767170 | |
14 totalRatingsMadeByRater float64 31793040 | |
15 postSelectionValue float64 31793040 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (before conversion) elapsed time: 0.05 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:notes total RAM: 122768821 bytes (0.123 GB) | |
column dtype RAM | |
0 noteId int64 12432248 | |
1 noteAuthorParticipantId object 12432248 | |
2 createdAtMillis int64 12432248 | |
3 tweetId object 12432248 | |
4 classification object 12432248 | |
5 believable category 1554155 | |
6 harmful category 1554155 | |
7 validationDifficulty category 1554155 | |
8 misleadingOther Int8 3108062 | |
9 misleadingFactualError Int8 3108062 | |
10 misleadingManipulatedMedia Int8 3108062 | |
11 misleadingOutdatedInformation Int8 3108062 | |
12 misleadingMissingImportantContext Int8 3108062 | |
13 misleadingUnverifiedClaimAsFact Int8 3108062 | |
14 misleadingSatire Int8 3108062 | |
15 notMisleadingOther Int8 3108062 | |
16 notMisleadingFactuallyCorrect Int8 3108062 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3108062 | |
18 notMisleadingClearlySatire Int8 3108062 | |
19 notMisleadingPersonalOpinion Int8 3108062 | |
20 trustworthySources Int8 3108062 | |
21 summary object 12432248 | |
22 isMediaNote Int8 3108062 | |
INFO:birdwatch.run_scoring:ratings total RAM: 13133224872 bytes (13.133 GB) | |
column dtype RAM | |
0 noteId int64 946538720 | |
1 raterParticipantId object 946538720 | |
2 createdAtMillis int64 946538720 | |
3 version Int8 236634680 | |
4 agree Int8 236634680 | |
5 disagree Int8 236634680 | |
6 helpful Int8 236634680 | |
7 notHelpful Int8 236634680 | |
8 helpfulnessLevel category 118317472 | |
9 helpfulOther Int8 236634680 | |
10 helpfulInformative Int8 236634680 | |
11 helpfulClear Int8 236634680 | |
12 helpfulEmpathetic Int8 236634680 | |
13 helpfulGoodSources Int8 236634680 | |
14 helpfulUniqueContext Int8 236634680 | |
15 helpfulAddressesClaim Int8 236634680 | |
16 helpfulImportantContext Int8 236634680 | |
17 helpfulUnbiasedLanguage Int8 236634680 | |
18 notHelpfulOther Int8 236634680 | |
19 notHelpfulIncorrect Int8 236634680 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 236634680 | |
21 notHelpfulOpinionSpeculationOrBias Int8 236634680 | |
22 notHelpfulMissingKeyPoints Int8 236634680 | |
23 notHelpfulOutdated Int8 236634680 | |
24 notHelpfulHardToUnderstand Int8 236634680 | |
25 notHelpfulArgumentativeOrBiased Int8 236634680 | |
26 notHelpfulOffTopic Int8 236634680 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 236634680 | |
28 notHelpfulIrrelevantSources Int8 236634680 | |
29 notHelpfulOpinionSpeculation Int8 236634680 | |
30 notHelpfulNoteNotNeeded Int8 236634680 | |
31 ratedOnTweetId int64 946538720 | |
32 helpfulNum float64 946538720 | |
33 postSelectionValue float64 946538720 | |
34 postSelectionValue_note_author float64 946538720 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 225817062 bytes (0.226 GB) | |
column dtype RAM | |
0 noteId int64 14004048 | |
1 noteAuthorParticipantId object 14004048 | |
2 createdAtMillis float64 14004048 | |
3 timestampMillisOfFirstNonNMRStatus float64 14004048 | |
4 firstNonNMRStatus category 1750630 | |
5 timestampMillisOfCurrentStatus float64 14004048 | |
6 currentStatus category 1750638 | |
7 timestampMillisOfLatestNonNMRStatus float64 14004048 | |
8 mostRecentNonNMRStatus category 1750630 | |
9 timestampMillisOfStatusLock float64 14004048 | |
10 lockedStatus category 1750638 | |
11 timestampMillisOfRetroLock float64 14004048 | |
12 currentCoreStatus category 1750638 | |
13 currentExpansionStatus category 1750638 | |
14 currentGroupStatus category 1750638 | |
15 currentDecidedBy category 1751254 | |
16 currentModelingGroup float64 14004048 | |
17 timestampMillisOfMostRecentStatusChange float64 14004048 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14004048 | |
19 currentMultiGroupStatus category 1750638 | |
20 currentModelingMultiGroup float64 14004048 | |
21 timestampMinuteOfFinalScoringOutput float64 14004048 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14004048 | |
23 classification object 14004048 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 59362560 bytes (0.059 GB) | |
column dtype RAM | |
0 participantId object 8331584 | |
1 enrollmentState object 8331584 | |
2 successfulRatingNeededToEarnIn int64 8331584 | |
3 timestampOfLastStateChange int64 8331584 | |
4 timestampOfLastEarnOut float64 8331584 | |
5 modelingPopulation category 1041472 | |
6 modelingGroup float64 8331584 | |
7 numberOfTimesEarnedOut int64 8331584 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 288245304 bytes (0.288 GB) | |
column dtype RAM | |
0 noteId object 64054512 | |
1 internalNoteIntercept float32 32027256 | |
2 internalNoteFactor1 float32 32027256 | |
3 scorerName object 64054512 | |
4 lowDiligenceNoteIntercept float32 32027256 | |
5 lowDiligenceNoteFactor1 float32 32027256 | |
6 lowDiligenceNoteInterceptRound2 float32 32027256 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 401387130 bytes (0.401 GB) | |
column dtype RAM | |
0 raterParticipantId object 31793040 | |
1 internalRaterIntercept float32 15896520 | |
2 internalRaterFactor1 float32 15896520 | |
3 crhCrnhRatioDifference float64 31793040 | |
4 meanNoteScore float64 31793040 | |
5 raterAgreeRatio float64 31793040 | |
6 aboveHelpfulnessThreshold object 31793040 | |
7 scorerName object 31793040 | |
8 internalRaterReputation float32 15896520 | |
9 lowDiligenceRaterIntercept float32 15896520 | |
10 lowDiligenceRaterFactor1 float32 15896520 | |
11 lowDiligenceRaterReputation float32 15896520 | |
12 lowDiligenceRaterInterceptRound2 float32 15896520 | |
13 incorrectTagRatingsMadeByRater Int64 35767170 | |
14 totalRatingsMadeByRater float64 31793040 | |
15 postSelectionValue float64 31793040 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (after conversion) elapsed time: 0.05 secs (0.00 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/run_scoring.py, in run_prescoring, at line 1187: prescoringRaterModelOutput = pd.concat( | |
PandasTypeError: DataFrame concat on postSelectionValue: output=float64 inputs=[dtype('float64'), dtype('int64')] (allowed) | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 401488938 bytes (0.401 GB) | |
column dtype RAM | |
0 raterParticipantId object 31801104 | |
1 internalRaterIntercept float32 15900552 | |
2 internalRaterFactor1 float32 15900552 | |
3 crhCrnhRatioDifference float64 31801104 | |
4 meanNoteScore float64 31801104 | |
5 raterAgreeRatio float64 31801104 | |
6 aboveHelpfulnessThreshold object 31801104 | |
7 scorerName object 31801104 | |
8 internalRaterReputation float32 15900552 | |
9 lowDiligenceRaterIntercept float32 15900552 | |
10 lowDiligenceRaterFactor1 float32 15900552 | |
11 lowDiligenceRaterReputation float32 15900552 | |
12 lowDiligenceRaterInterceptRound2 float32 15900552 | |
13 incorrectTagRatingsMadeByRater Int64 35776242 | |
14 totalRatingsMadeByRater float64 31801104 | |
15 postSelectionValue float64 31801104 | |
INFO:birdwatch.constants:Logging Prescoring Results RAM usage (after concatenation) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:Initial value of OPENBLAS_NUM_THREADS: None | |
INFO:birdwatch.run_scoring:New value of OPENBLAS_NUM_THREADS: 1 | |
INFO:birdwatch.pflip_model:seeding pflip: 0 | |
INFO:birdwatch.pflip_model:total ratings considered for pflip model: 118317340 | |
INFO:birdwatch.pflip_model:total ratings before initial note status for pflip model: 96900804 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/pflip_model.py, in _get_recent_notes, at line 303: noteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (UNALLOWED) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (UNALLOWED) | |
INFO:birdwatch.pflip_model:labels before ScoringDriftGuard: | |
LABEL | |
CRH 124142 | |
FLIP 50501 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:labels after ScoringDriftGuard: | |
LABEL | |
CRH 105059 | |
FLIP 50501 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:labels after restricting to recent notes: | |
LABEL | |
CRH 74987 | |
FLIP 33738 | |
Name: count, dtype: int64 | |
INFO:birdwatch.pflip_model:total ratings included in pflip model: 6944828 | |
INFO:birdwatch.pflip_model:noteInfo summary: 17a54c392ccaf1174346230a91aba110c1444ea45587e11aa93db8ab98318fad | |
INFO:birdwatch.pflip_model:pflip training data size: 97852 | |
INFO:birdwatch.pflip_model:trainDataFrame summary: bbeb8e07e7766d84ce88950ab885e25be16a46c4b383d2d031e8b832e9c1145b | |
INFO:birdwatch.pflip_model:pflip validation data size: 10873 | |
INFO:birdwatch.pflip_model:validationDataFrame summary: 77183f09c5fda8dff9ad5f5895fa63746e27f4ae5d7a3a080226b15378d9004b | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/feature_extraction/text.py:525: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None' | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/feature_extraction/text.py:525: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None' | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 0 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 6 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 7 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 8 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 9 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 10 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 11 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 12 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 13 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 14 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 15 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 1 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 2 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 4 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 5 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 7 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 8 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 9 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 10 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 12 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 13 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 15 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 16 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 17 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 18 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 19 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 20 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 21 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 22 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 23 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 24 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 25 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 26 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 27 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 28 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 29 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 30 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 31 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 33 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 34 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 35 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 36 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 37 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 38 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 39 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 40 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 41 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 42 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 43 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:313: UserWarning: Bins whose width are too small (i.e., <= 1e-8) in feature 44 are removed. Consider decreasing the number of bins. | |
warnings.warn( | |
/home/ubuntu/.env/lib/python3.10/site-packages/sklearn/preprocessing/_discretization.py:239: FutureWarning: In version 1.5 onwards, subsample=200_000 will be used by default. Set subsample explicitly to silence this warning in the mean time. Set subsample=None to disable subsampling explicitly. | |
warnings.warn( | |
INFO:birdwatch.pflip_model:Training Results: | |
INFO:birdwatch.pflip_model:threshold=-7.578478803355093 tpr=0.7369561648333607 fpr=0.24998145318040862 auc=0.8309798386552514 | |
INFO:birdwatch.pflip_model:Validation Results: | |
INFO:birdwatch.pflip_model:threshold=-7.578478803355093 tpr=0.7100213219616205 fpr=0.2624505928853755 auc=0.8046833670640103 | |
INFO:birdwatch.run_scoring:Final value of OPENBLAS_NUM_THREADS: None | |
INFO:birdwatch.constants:Fitting pflip model elapsed time: 342.78 secs (5.71 mins) | |
INFO:birdwatch.run_scoring:We invoked run_scoring and are now in between prescoring and scoring. | |
INFO:birdwatch.run_scoring:Starting final scoring | |
INFO:birdwatch.run_scoring:notes total RAM: 122768821 bytes (0.123 GB) | |
column dtype RAM | |
0 noteId int64 12432248 | |
1 noteAuthorParticipantId object 12432248 | |
2 createdAtMillis int64 12432248 | |
3 tweetId object 12432248 | |
4 classification object 12432248 | |
5 believable category 1554155 | |
6 harmful category 1554155 | |
7 validationDifficulty category 1554155 | |
8 misleadingOther Int8 3108062 | |
9 misleadingFactualError Int8 3108062 | |
10 misleadingManipulatedMedia Int8 3108062 | |
11 misleadingOutdatedInformation Int8 3108062 | |
12 misleadingMissingImportantContext Int8 3108062 | |
13 misleadingUnverifiedClaimAsFact Int8 3108062 | |
14 misleadingSatire Int8 3108062 | |
15 notMisleadingOther Int8 3108062 | |
16 notMisleadingFactuallyCorrect Int8 3108062 | |
17 notMisleadingOutdatedButNotWhenWritten Int8 3108062 | |
18 notMisleadingClearlySatire Int8 3108062 | |
19 notMisleadingPersonalOpinion Int8 3108062 | |
20 trustworthySources Int8 3108062 | |
21 summary object 12432248 | |
22 isMediaNote Int8 3108062 | |
INFO:birdwatch.run_scoring:ratings total RAM: 11296408047 bytes (11.296 GB) | |
column dtype RAM | |
0 noteId int64 951276456 | |
1 raterParticipantId object 951276456 | |
2 createdAtMillis int64 951276456 | |
3 version Int8 237819114 | |
4 agree Int8 237819114 | |
5 disagree Int8 237819114 | |
6 helpful Int8 237819114 | |
7 notHelpful Int8 237819114 | |
8 helpfulnessLevel category 118909689 | |
9 helpfulOther Int8 237819114 | |
10 helpfulInformative Int8 237819114 | |
11 helpfulClear Int8 237819114 | |
12 helpfulEmpathetic Int8 237819114 | |
13 helpfulGoodSources Int8 237819114 | |
14 helpfulUniqueContext Int8 237819114 | |
15 helpfulAddressesClaim Int8 237819114 | |
16 helpfulImportantContext Int8 237819114 | |
17 helpfulUnbiasedLanguage Int8 237819114 | |
18 notHelpfulOther Int8 237819114 | |
19 notHelpfulIncorrect Int8 237819114 | |
20 notHelpfulSourcesMissingOrUnreliable Int8 237819114 | |
21 notHelpfulOpinionSpeculationOrBias Int8 237819114 | |
22 notHelpfulMissingKeyPoints Int8 237819114 | |
23 notHelpfulOutdated Int8 237819114 | |
24 notHelpfulHardToUnderstand Int8 237819114 | |
25 notHelpfulArgumentativeOrBiased Int8 237819114 | |
26 notHelpfulOffTopic Int8 237819114 | |
27 notHelpfulSpamHarassmentOrAbuse Int8 237819114 | |
28 notHelpfulIrrelevantSources Int8 237819114 | |
29 notHelpfulOpinionSpeculation Int8 237819114 | |
30 notHelpfulNoteNotNeeded Int8 237819114 | |
31 ratedOnTweetId int64 951276456 | |
32 helpfulNum float64 951276456 | |
INFO:birdwatch.run_scoring:noteStatusHistory total RAM: 225817062 bytes (0.226 GB) | |
column dtype RAM | |
0 noteId int64 14004048 | |
1 noteAuthorParticipantId object 14004048 | |
2 createdAtMillis float64 14004048 | |
3 timestampMillisOfFirstNonNMRStatus float64 14004048 | |
4 firstNonNMRStatus category 1750630 | |
5 timestampMillisOfCurrentStatus float64 14004048 | |
6 currentStatus category 1750638 | |
7 timestampMillisOfLatestNonNMRStatus float64 14004048 | |
8 mostRecentNonNMRStatus category 1750630 | |
9 timestampMillisOfStatusLock float64 14004048 | |
10 lockedStatus category 1750638 | |
11 timestampMillisOfRetroLock float64 14004048 | |
12 currentCoreStatus category 1750638 | |
13 currentExpansionStatus category 1750638 | |
14 currentGroupStatus category 1750638 | |
15 currentDecidedBy category 1751254 | |
16 currentModelingGroup float64 14004048 | |
17 timestampMillisOfMostRecentStatusChange float64 14004048 | |
18 timestampMillisOfNmrDueToMinStableCrhTime float64 14004048 | |
19 currentMultiGroupStatus category 1750638 | |
20 currentModelingMultiGroup float64 14004048 | |
21 timestampMinuteOfFinalScoringOutput float64 14004048 | |
22 timestampMillisOfFirstNmrDueToMinStableCrhTime float64 14004048 | |
23 classification object 14004048 | |
INFO:birdwatch.run_scoring:userEnrollment total RAM: 59362560 bytes (0.059 GB) | |
column dtype RAM | |
0 participantId object 8331584 | |
1 enrollmentState object 8331584 | |
2 successfulRatingNeededToEarnIn int64 8331584 | |
3 timestampOfLastStateChange int64 8331584 | |
4 timestampOfLastEarnOut float64 8331584 | |
5 modelingPopulation category 1041472 | |
6 modelingGroup float64 8331584 | |
7 numberOfTimesEarnedOut int64 8331584 | |
INFO:birdwatch.run_scoring:prescoringNoteModelOutput total RAM: 288245304 bytes (0.288 GB) | |
column dtype RAM | |
0 noteId object 64054512 | |
1 internalNoteIntercept float32 32027256 | |
2 internalNoteFactor1 float32 32027256 | |
3 scorerName object 64054512 | |
4 lowDiligenceNoteIntercept float32 32027256 | |
5 lowDiligenceNoteFactor1 float32 32027256 | |
6 lowDiligenceNoteInterceptRound2 float32 32027256 | |
INFO:birdwatch.run_scoring:prescoringRaterModelOutput total RAM: 401488938 bytes (0.401 GB) | |
column dtype RAM | |
0 raterParticipantId object 31801104 | |
1 internalRaterIntercept float32 15900552 | |
2 internalRaterFactor1 float32 15900552 | |
3 crhCrnhRatioDifference float64 31801104 | |
4 meanNoteScore float64 31801104 | |
5 raterAgreeRatio float64 31801104 | |
6 aboveHelpfulnessThreshold object 31801104 | |
7 scorerName object 31801104 | |
8 internalRaterReputation float32 15900552 | |
9 lowDiligenceRaterIntercept float32 15900552 | |
10 lowDiligenceRaterFactor1 float32 15900552 | |
11 lowDiligenceRaterReputation float32 15900552 | |
12 lowDiligenceRaterInterceptRound2 float32 15900552 | |
13 incorrectTagRatingsMadeByRater Int64 35776242 | |
14 totalRatingsMadeByRater float64 31801104 | |
15 postSelectionValue float64 31801104 | |
INFO:birdwatch.constants:Logging Final Scoring RAM usage elapsed time: 0.06 secs (0.00 mins) | |
INFO:birdwatch.run_scoring:No previous scored notes passed; scoring all notes. | |
INFO:birdwatch.run_scoring:2. Rescore all recently created notes if not rescored at the minimum frequency. | |
INFO:birdwatch.run_scoring:Num notes created recently: 34324 | |
INFO:birdwatch.run_scoring:3. Rescore all notes that flipped status in the previous scoring run. 47 | |
INFO:birdwatch.run_scoring:4. Rescore all recently-flipped notes if not rescored at the minimum frequency. | |
INFO:birdwatch.run_scoring:Num notes flipped recently: 0 | |
INFO:birdwatch.run_scoring:Num notes not rescored recently enough: 1691896 | |
INFO:birdwatch.run_scoring:5. Rescore all notes that were NMRed due to MinStableCrhTime was not met. 24 | |
INFO:birdwatch.run_scoring:6. Rescore recent unlocked notes that are eligible for locking 13312 | |
INFO:birdwatch.run_scoring:---- | |
Notes to rescore: | |
* 0 notes with new ratings since last scoring run. | |
* 30164 notes created recently and not rescored recently enough. | |
* 47 notes that flipped status in the previous scoring run. | |
* 0 notes that flipped status recently and not rescored recently enough. | |
* 24 notes that were NMRed due to MinStableCrhTime was not met. | |
* 13312 recent notes that are eligible to lock but haven't locked yet. | |
Overall: 43479 notes to rescore, out of 1554031 total. | |
---- | |
INFO:birdwatch.constants:Determine which notes to score. elapsed time: 0.07 secs (0.00 mins) | |
INFO:birdwatch.process_data:Timestamp of latest rating in data: 2025-01-04 01:01:21.258000 | |
INFO:birdwatch.process_data:Timestamp of latest note in data: 2025-01-04 01:01:14.426000 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 31: newNoteStatusHistory = oldNoteStatusHistory.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Output mismatch on createdAtMillis_notes: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.note_status_history:total notes added to noteStatusHistory: 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_status_history.py, in merge_note_info, at line 57: newNoteStatusHistory[[c.noteIdKey, c.createdAtMillisKey]].merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
PandasTypeError: Merge key mismatch on createdAtMillis: left=float64 vs right=int64 (allowed) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/process_data.py, in _filter_misleading_notes, at line 270: ratings = ratings.merge( | |
PandasTypeError: Input mismatch on createdAtMillis: left=int64 vs right=float64 (allowed) | |
INFO:birdwatch.process_data:Preprocess Data: Filter misleading notes, starting with 118909557 ratings on 1561961 notes | |
INFO:birdwatch.process_data: Keeping 85792913 ratings on 1051136 misleading notes | |
INFO:birdwatch.process_data: Keeping 8763256 ratings on 149656 deleted notes that were previously scored (in note status history) | |
INFO:birdwatch.process_data: Removing 0 ratings on 0 older notes that aren't deleted, but are not-misleading. | |
INFO:birdwatch.process_data: Removing 0 ratings on 0 notes that were deleted and not in note status history (e.g. old). | |
INFO:birdwatch.process_data:Num Ratings: 118909557, Num Unique Notes Rated: 1561961, Num Unique Raters: 1040729 | |
INFO:birdwatch.constants:Preprocess smaller dataset since we skipped preprocessing at read time elapsed time: 451.52 secs (7.53 mins) | |
INFO:birdwatch.topic_model:Assigning notes to topics: | |
INFO:birdwatch.constants:Get Note Topics: Predict elapsed time: 79.46 secs (1.32 mins) | |
INFO:birdwatch.topic_model: Notes unassigned due to multiple matches: 1736 | |
INFO:birdwatch.constants:Get Note Topics: Make Seed Labels elapsed time: 83.22 secs (1.39 mins) | |
INFO:birdwatch.topic_model: Post Topic assignment results: [888954 26545 54077 2347] | |
INFO:birdwatch.topic_model: Note Topic assignment results: | |
noteTopic | |
GazaConflict 112059 | |
UkraineConflict 45446 | |
MessiRonaldo 4027 | |
Name: count, dtype: int64 | |
INFO:birdwatch.constants:Get Note Topics: Merge and assign predictions elapsed time: 1.54 secs (0.03 mins) | |
INFO:birdwatch.constants:Note Topic Assignment elapsed time: 183.32 secs (3.06 mins) | |
INFO:birdwatch.run_scoring:Post Selection Similarity Final Scoring: begin with 118909557 ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:111: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.sort_values( | |
/home/ubuntu/communitynotes/sourcecode/scoring/post_selection_similarity.py:114: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratingsWithPostSelectionSimilarityValue.drop_duplicates( | |
INFO:birdwatch.run_scoring:Post Selection Similarity Final Scoring: 118317340 ratings remaining. | |
INFO:birdwatch.constants:Post Selection Similarity: Final Scoring elapsed time: 268.67 secs (4.48 mins) | |
INFO:birdwatch.run_scoring:Starting parallel scorer execution with 23 scorers. | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.run_scoring:MFExpansionPlusScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionPlusScorer run_scorer_parallelizable: Loading data elapsed time: 25.69 secs (0.43 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFExpansionPlusScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFCoreScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFCoreScorer run_scorer_parallelizable: Loading data elapsed time: 25.84 secs (0.43 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFCoreScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFExpansionScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFExpansionScorer run_scorer_parallelizable: Loading data elapsed time: 25.86 secs (0.43 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFExpansionScorer set to: 12 | |
INFO:birdwatch.run_scoring:MFGroupScorer_12 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.run_scoring:ReputationScorer run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_12 run_scorer_parallelizable: Loading data elapsed time: 25.82 secs (0.43 mins) | |
INFO:birdwatch.constants:ReputationScorer run_scorer_parallelizable: Loading data elapsed time: 25.85 secs (0.43 mins) | |
INFO:birdwatch.scorer:score_final: Torch intra-op parallelism for ReputationScorer set to: 12 | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_12 set to: 4 | |
INFO:birdwatch.run_scoring:MFGroupScorer_13 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_13 run_scorer_parallelizable: Loading data elapsed time: 26.15 secs (0.44 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_13 set to: 8 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_12. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionPlusScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFCoreScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for ReputationScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFExpansionScorer. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_13. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 764628 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Filter input elapsed time: 40.13 secs (0.67 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 458397, Num Unique Notes Rated: 32125, Num Unique Raters: 11120 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Prepare ratings elapsed time: 0.26 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 4847 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22716 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5539 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4847 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 320638 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 320638 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4847, Notes: 32100 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 9.988722741433023 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 66.15184650299155 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 10.237497984846042 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15517044067382812 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10627015680074692 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10270244628190994 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0666632205247879 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09921582043170929 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06453729420900345 | |
INFO:birdwatch.matrix_factorization:Num epochs: 51 | |
INFO:birdwatch.matrix_factorization:epoch 51 0.0990028977394104 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06464157998561859 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18789513409137726 | |
INFO:birdwatch.scorer:MFGroupScorer_12 Final helpfulness-filtered MF elapsed time: 2.82 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_12 final scoring, about to call diligence with 320638 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1691896445111087535 ... -1.797996 | |
1 1710861026604834963 ... -0.439698 | |
2 1710982119822852314 ... 4.282034 | |
3 1712851765647876276 ... -2.332320 | |
4 1712851975610487188 ... -0.878862 | |
... ... ... ... | |
31010 1845648344003027373 ... -0.412559 | |
31011 1864044809939243091 ... -0.362962 | |
31012 1714417073764421719 ... -0.457417 | |
31013 1797659363445788823 ... -0.535873 | |
31014 1828127526306169170 ... -0.535034 | |
[31015 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... ... NaN | |
1 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... ... NaN | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... ... NaN | |
3 000957CF1421B543AEAFEBF835033D3BA5FB1B99FB0AF8... ... NaN | |
4 001041D12A03F39CCB40BEA9458C469323254EEC76348B... ... -0.186429 | |
... ... ... ... | |
22711 FFE87CF4860C52665B228E9F345BB3EE183994416FA6D7... ... NaN | |
22712 FFEEE02BCED1134EB1C57875779C03F2135B72BB4C8E7F... ... 0.387092 | |
22713 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... ... NaN | |
22714 FFFA40CBF0CC13E71072BFE89E80372A5907BD9D2EDA54... ... NaN | |
22715 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... ... NaN | |
[22716 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4847, vs. num we are initializing: 22716 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4847 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4847, vs. num we are initializing: 22716 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4847 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4847, vs. num we are initializing: 22716 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 4847 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 32100, vs. num we are initializing: 31015 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 31505 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 595 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 32100, vs. num we are initializing: 31015 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 31505 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 595 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.860401 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.642860 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.612004 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.608824 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.608517 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=140 | loss=2.608467 | time=1.4s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4847, vs. num we are initializing: 22716 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4847 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.542247 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.480972 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.479966 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.479932 | time=0.8s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4799 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_12 Low Diligence Reputation Model elapsed time: 2.76 secs (0.05 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_12 | |
INFO:birdwatch.scorer: Ratings after group filter: 35062479 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Filter input elapsed time: 47.57 secs (0.79 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer: Ratings after group filter: 118317340 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Filter input elapsed time: 49.35 secs (0.82 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.61 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.76 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 102142736 | |
INFO:birdwatch.scorer:MFCoreScorer Filter input elapsed time: 56.82 secs (0.95 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.scorer: Ratings after group filter: 102142736 | |
INFO:birdwatch.scorer:ReputationScorer Filter input elapsed time: 58.71 secs (0.98 mins) | |
INFO:birdwatch.reputation_scorer:seeding with 0 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.60 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 118315149 | |
INFO:birdwatch.scorer:MFExpansionScorer Filter input elapsed time: 61.57 secs (1.03 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1748648 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 895 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 549 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.94 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.58 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 1.08 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.67 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 34299271, Num Unique Notes Rated: 609958, Num Unique Raters: 215930 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Prepare ratings elapsed time: 19.93 secs (0.33 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 180 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.85 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.45 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 7284 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.78 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.39 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 20 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.44 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 103061 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 222125 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 109801 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 101479596, Num Unique Notes Rated: 1207469, Num Unique Raters: 784040 | |
INFO:birdwatch.scorer:MFCoreScorer Prepare ratings elapsed time: 54.52 secs (0.91 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 103061 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 19815879 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 19815879 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 38.61 secs (0.64 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_12 Final compute scored notes elapsed time: 68.30 secs (1.14 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_12 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 117720044, Num Unique Notes Rated: 1298934, Num Unique Raters: 1040518 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Prepare ratings elapsed time: 68.94 secs (1.15 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 103061, Notes: 609122 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 32.53187210443885 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 192.27330415967242 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 32.61067555088812 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 117717845, Num Unique Notes Rated: 1298931, Num Unique Raters: 1040483 | |
INFO:birdwatch.scorer:MFExpansionScorer Prepare ratings elapsed time: 64.44 secs (1.07 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12420346587896347 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10024061053991318 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.process_data:After applying min 10 ratings per rater and min 5 raters per note: | |
Num Ratings: 100691291, Num Unique Notes Rated: 1205894, Num Unique Raters: 590255 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10472938418388367 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07926952838897705 | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 4928 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_12 Postprocess output elapsed time: 51.87 secs (0.86 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10362616181373596 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07790735363960266 | |
INFO:birdwatch.run_scoring:MFGroupScorer_11 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_11 run_scorer_parallelizable: Loading data elapsed time: 22.29 secs (0.37 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_11 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_11. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:Num epochs: 60 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10348565876483917 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07774236053228378 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1557331085205078 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Final helpfulness-filtered MF elapsed time: 86.70 secs (1.45 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_13 final scoring, about to call diligence with 19815879 final round ratings. | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 377369 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 574793 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 408324 | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1549781045201047554 ... 0.581661 | |
1 1592925068132245504 ... -0.313237 | |
2 1593079642092617729 ... 0.906070 | |
3 1595167355637796876 ... -0.465690 | |
4 1597230938316054532 ... -1.548695 | |
... ... ... ... | |
607514 1855061828675498184 ... -0.216159 | |
607515 1663589142351970305 ... -0.306378 | |
607516 1741105701643268121 ... 1.283426 | |
607517 1694299885778981367 ... -0.176601 | |
607518 1783727968046432453 ... -0.203176 | |
[607519 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.486221 | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... ... NaN | |
2 00022C96980039352E2D04B5E533090FA8BA333F87C5EB... ... 0.249502 | |
3 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... ... NaN | |
4 000274A83456E40A03B81628F432D06A3506E28C77FEA8... ... NaN | |
... ... ... ... | |
222120 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... ... NaN | |
222121 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... ... 0.311522 | |
222122 FFFF0C7BF4089C6436CAB332B309A1A81C21E11CD61CE4... ... NaN | |
222123 FFFF3B1E5FB7927B196BCC7753E5CE5B2E64AFA90099E0... ... NaN | |
222124 FFFF7E0B3ADB6FC5FB42B0F01FFD24495410C1AE4AC986... ... -0.133762 | |
[222125 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 103061, vs. num we are initializing: 222125 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 103061 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 103061, vs. num we are initializing: 222125 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 103061 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 103061, vs. num we are initializing: 222125 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 103061 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 609122, vs. num we are initializing: 607519 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 570354 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 38768 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 609122, vs. num we are initializing: 607519 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 570354 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 38768 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=4.497056 | time=0.1s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1735379 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Filter input elapsed time: 39.21 secs (0.65 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1118923, Num Unique Notes Rated: 93062, Num Unique Raters: 11799 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Prepare ratings elapsed time: 0.55 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 489800 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712511 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 542394 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 377369 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 52237974 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 52237974 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 6552 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 48937 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 7130 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 6552 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 731329 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 731329 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 6552, Notes: 92847 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.876711148448523 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 111.61920024420024 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 7.9375824367485315 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1410684585571289 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09966675192117691 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10076995193958282 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06930522620677948 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.434547 | time=12.5s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09890143573284149 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06741584092378616 | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 489451 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 712504 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 542045 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09864601492881775 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06716492027044296 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09861166775226593 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06712664663791656 | |
INFO:birdwatch.matrix_factorization:Num epochs: 93 | |
INFO:birdwatch.matrix_factorization:epoch 93 0.09860806167125702 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06712305545806885 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1667405664920807 | |
INFO:birdwatch.scorer:MFGroupScorer_11 Final helpfulness-filtered MF elapsed time: 5.32 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_11 final scoring, about to call diligence with 731329 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1643057880793325568 2.275487 3.530573 | |
1 1660244070621380610 0.605727 -2.286807 | |
2 1681713296271892480 -6.667027 1.022068 | |
3 1686264916682919936 2.253821 -1.295951 | |
4 1686753388883546113 -0.822459 2.509581 | |
... ... ... ... | |
91732 1819830116945387728 -0.451016 -0.844434 | |
91733 1819849846179627161 -0.449976 -0.844213 | |
91734 1819925754207150395 -0.449914 -0.845834 | |
91735 1870385083439513634 2.137305 2.945505 | |
91736 1714789934614098338 -0.229163 0.925463 | |
internalNoteInterceptRound2 | |
0 2.275487 | |
1 0.605727 | |
2 -6.667027 | |
3 2.253821 | |
4 -0.822459 | |
... ... | |
91732 -0.451016 | |
91733 -0.449976 | |
91734 -0.449914 | |
91735 2.137305 | |
91736 -0.229163 | |
[91737 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
3 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
4 0002D1E11A8EA1E4B25048FA9D117406CE9EB1D3143BC9... | |
... ... | |
48932 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... | |
48933 FFFA49720F254411E1F79CA757C403F0A0217240BC4922... | |
48934 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... | |
48935 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
48936 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
48932 NaN NaN NaN | |
48933 0.052763 0.571174 0.795266 | |
48934 NaN NaN NaN | |
48935 NaN NaN NaN | |
48936 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
48932 NaN | |
48933 0.052763 | |
48934 NaN | |
48935 NaN | |
48936 NaN | |
[48937 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6552, vs. num we are initializing: 48937 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 6552 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6552, vs. num we are initializing: 48937 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 6552 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6552, vs. num we are initializing: 48937 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 6552 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 92847, vs. num we are initializing: 91737 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 90283 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 2564 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 92847, vs. num we are initializing: 91737 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 90283 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 2564 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.302128 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1352796878438424576 ... -0.060955 | |
1 1353415873227177985 ... 0.048938 | |
2 1354586938863443971 ... NaN | |
3 1354588003075764229 ... NaN | |
4 1354588172659920899 ... NaN | |
... ... ... ... | |
1750501 1875355579318722848 ... NaN | |
1750502 1875355783946293371 ... NaN | |
1750503 1875355865437315075 ... NaN | |
1750504 1875355877789880344 ... NaN | |
1750505 1875356460340679029 ... NaN | |
[1750506 rows x 7 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 0000010BB832A9CFDF102BF7B66896FA987C80FBB61EF6... ... 0.127391 | |
1 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... 0.090071 | |
2 0000315D36021A528D85155729DDBF2E299BB8C3040878... ... 0.143150 | |
3 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.103966 | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... 0.170236 | |
... ... ... ... | |
590250 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... ... 0.182920 | |
590251 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... -0.063934 | |
590252 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.194488 | |
590253 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... -0.394162 | |
590254 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... 0.005964 | |
[590255 rows x 16 columns] | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.707204 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 590255, vs. num we are initializing: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.659630 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.652598 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.651290 | time=2.7s | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 590255, vs. num we are initializing: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.650836 | time=3.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.650623 | time=4.0s | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.650513 | time=4.7s | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 590255, vs. num we are initializing: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=230 | loss=2.650469 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 6552, vs. num we are initializing: 48937 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 6552 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.567179 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.474111 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.472549 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.472500 | time=1.6s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4725 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_11 Low Diligence Reputation Model elapsed time: 8.20 secs (0.14 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_11 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.407713 | time=25.0s | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1205894, vs. num we are initializing: 1750506 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1149359 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 56535 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1205894, vs. num we are initializing: 1750506 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1149359 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 56535 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.204937 | time=0.7s | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 3.67 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.27 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.405916 | time=37.3s | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.55 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.92 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.50 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1745145 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 1712 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 1179 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.85 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.44 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=110 | loss=2.405834 | time=45.5s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 103061, vs. num we are initializing: 222125 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 103061 | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.50 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 489800 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 61628324 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 61628324 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.266507 | time=0.2s | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 279 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.77 secs (0.05 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 489451 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 61502082 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 61502082 | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.36 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 17966 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.78 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.41 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 134 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.85 secs (0.01 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377369, Notes: 1205353 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.257681 | time=13.5s | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 43.338319977633105 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 138.4267759142908 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 43.389993420987054 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12316302955150604 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09815917909145355 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=050 | loss=0.257596 | time=21.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2576 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_13 Low Diligence Reputation Model elapsed time: 92.00 secs (1.53 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_13 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.109874 | time=57.9s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 41.37 secs (0.69 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_11 Final compute scored notes elapsed time: 72.61 secs (1.21 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_11 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 489451, Notes: 1296961 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 489800, Notes: 1296988 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.42014756033527 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.65523821587861 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 47.486095560494 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 47.51649514104988 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 125.82344630461412 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 47.582615745745436 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.12271592020988464 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09755367785692215 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1227373406291008 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09757845848798752 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10948103666305542 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08243284374475479 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.108417 | time=116.4s | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 8643 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_11 Postprocess output elapsed time: 61.51 secs (1.03 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10880401730537415 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0817750096321106 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11040870100259781 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08333969861268997 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1104157343506813 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08334934711456299 | |
INFO:birdwatch.run_scoring:MFGroupScorer_10 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_10 run_scorer_parallelizable: Loading data elapsed time: 23.44 secs (0.39 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_10 set to: 4 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 77.05 secs (1.28 mins) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_10. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:Num epochs: 49 | |
INFO:birdwatch.matrix_factorization:epoch 49 0.10874594748020172 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08173158019781113 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1698254644870758 | |
INFO:birdwatch.scorer:MFCoreScorer Final helpfulness-filtered MF elapsed time: 183.04 secs (3.05 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.108362 | time=176.9s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 993406 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Filter input elapsed time: 41.05 secs (0.68 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 511115, Num Unique Notes Rated: 45202, Num Unique Raters: 9504 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Prepare ratings elapsed time: 0.28 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 4776 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 30734 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 5431 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 4776 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 347745 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 347745 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 4776, Notes: 45105 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 7.709677419354839 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 72.8109296482412 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 7.851096856666743 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1504622846841812 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10350238531827927 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09826453030109406 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.063314288854599 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09529581665992737 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061294954270124435 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09498219192028046 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06137792766094208 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.094944529235363 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06141533702611923 | |
INFO:birdwatch.matrix_factorization:Num epochs: 81 | |
INFO:birdwatch.matrix_factorization:epoch 81 0.094944529235363 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06141533702611923 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17643898725509644 | |
INFO:birdwatch.scorer:MFGroupScorer_10 Final helpfulness-filtered MF elapsed time: 2.38 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_10 final scoring, about to call diligence with 347745 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1653111205429403666 1.708474 -0.164413 | |
1 1661796202554294297 -0.167551 3.731221 | |
2 1715444846586929540 -1.217183 -1.369363 | |
3 1738503882395844941 -6.172967 -0.742741 | |
4 1738528131655323997 -3.485948 -0.206753 | |
... ... ... ... | |
43802 1781302829556130237 0.560532 -1.497579 | |
43803 1870681504772186619 0.618976 1.895533 | |
43804 1830682877715247309 -0.246469 0.880792 | |
43805 1736435011405181005 -0.213010 -0.917342 | |
43806 1828123440026378402 -0.417520 -0.085999 | |
internalNoteInterceptRound2 | |
0 1.708474 | |
1 -0.167551 | |
2 -1.217183 | |
3 -6.172967 | |
4 -3.485948 | |
... ... | |
43802 0.560532 | |
43803 0.618976 | |
43804 -0.246469 | |
43805 -0.213010 | |
43806 -0.417520 | |
[43807 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
3 00037E5A04D7781E19E5AAF559E14512FF17E7F76C30AF... | |
4 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
... ... | |
30729 FFE9E0E39C0049AD113CEF0AB5178393F13B15C4E7B31C... | |
30730 FFF104BC8D2B5E53432FF3E605B5D5D76EDECE29AFA0F5... | |
30731 FFF1316D167C80F6D36C904E952D720D8E8DAE052288D1... | |
30732 FFF5A46494A3BDEC6FFF8A38A777E53484648B186FCD76... | |
30733 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
30729 0.491791 0.118269 0.564683 | |
30730 0.185441 1.502850 0.418307 | |
30731 NaN NaN NaN | |
30732 NaN NaN NaN | |
30733 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
30729 0.491791 | |
30730 0.185441 | |
30731 NaN | |
30732 NaN | |
30733 NaN | |
[30734 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4776, vs. num we are initializing: 30734 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 4776 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4776, vs. num we are initializing: 30734 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4776 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4776, vs. num we are initializing: 30734 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 4776 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 45105, vs. num we are initializing: 43807 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 43880 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1225 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 45105, vs. num we are initializing: 43807 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 43880 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1225 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.409797 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.620432 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.579739 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.573520 | time=1.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.572579 | time=1.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.572326 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.572222 | time=2.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=190 | loss=2.572201 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 4776, vs. num we are initializing: 30734 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 4776 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.587514 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.488095 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.486561 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=080 | loss=0.486501 | time=0.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4865 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_10 Low Diligence Reputation Model elapsed time: 3.76 secs (0.06 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_10 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10978090018033981 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0827120766043663 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10977377742528915 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08270204812288284 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 55.11 secs (0.92 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float64 inputs=[dtype('float32'), dtype('float64')] (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 135: self.raterIdMapWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.64 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _add_extreme_raters_to_id_maps_and_params, at line 151: self.raterParamsWithExtreme = pd.concat( | |
PandasTypeError: DataFrame concat on raterParticipantId: output=object inputs=[dtype('O'), dtype('int64')] (allowed) | |
PandasTypeError: DataFrame concat on internalRaterFactor1: output=float64 inputs=[dtype('float64'), dtype('float32')] (allowed) | |
INFO:birdwatch.constants:Pseudoraters: prepare data elapsed time: 0.53 secs (0.01 mins) | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'internalRaterIntercept': None, 'internalRaterFactor1': None, 'helpfulNum': None} | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.15 secs (0.00 mins) | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.78 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.04 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377369, Notes: 1205353 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.58 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.14 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.90 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.15 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.58 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.84 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:epoch 0 0.13380815088748932 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10416073352098465 | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.68 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1729576 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 111941 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.58 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 56554 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 4.36 secs (0.07 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 5.05 secs (0.08 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.79 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.89 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1747758 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 693 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 395 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.03 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.57 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.66 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.71 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 12624 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.54 secs (0.04 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.65 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.20 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 142 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.97 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 100406 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.54 secs (0.04 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.61 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.19 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 305 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 8669 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.89 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.49 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 62 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.91 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.51 secs (0.03 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=0.108360 | time=237.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=0.108360 | time=257.3s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 590255, vs. num we are initializing: 590255 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 590255 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.009269 | time=0.8s | |
INFO:birdwatch.matrix_factorization:Num epochs: 59 | |
INFO:birdwatch.matrix_factorization:epoch 59 0.10969852656126022 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0825127437710762 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17131315171718597 | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Final helpfulness-filtered MF elapsed time: 246.72 secs (4.11 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionPlusScorer final scoring, about to call diligence with 61628324 final round ratings. | |
INFO:birdwatch.matrix_factorization:Num epochs: 59 | |
INFO:birdwatch.matrix_factorization:epoch 59 0.10969139635562897 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08250270783901215 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17132236063480377 | |
INFO:birdwatch.scorer:MFExpansionScorer Final helpfulness-filtered MF elapsed time: 244.55 secs (4.08 mins) | |
INFO:birdwatch.mf_base_scorer:In MFExpansionScorer final scoring, about to call diligence with 61502082 final round ratings. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.73 secs (0.66 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10995960235595703 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0827786847949028 | |
INFO:birdwatch.scorer:MFGroupScorer_13 Final compute scored notes elapsed time: 229.99 secs (3.83 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_13 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.29 secs (0.65 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_10 Final compute scored notes elapsed time: 70.03 secs (1.17 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_10 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.007150 | time=56.9s | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10886719077825546 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0817500650882721 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 5169 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_10 Postprocess output elapsed time: 51.85 secs (0.86 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 114714 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_13 Postprocess output elapsed time: 70.84 secs (1.18 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1715437541212520617 ... 1.918703 | |
1 1722158725807378589 ... -1.756146 | |
2 1724462438022554032 ... -0.929131 | |
3 1724471553906131352 ... -0.987910 | |
4 1733114336250380782 ... -1.818934 | |
... ... ... ... | |
1295075 1725550033179742375 ... -0.203534 | |
1295076 1806065703117832239 ... -0.227197 | |
1295077 1748887262912274653 ... -0.244440 | |
1295078 1737522200616263780 ... 0.876775 | |
1295079 1872611622075723818 ... -0.243256 | |
[1295080 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... -0.629526 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... ... -0.169457 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.660671 | |
3 00004D45B2AFE9EA96333B280009DCC621851088264E8F... ... NaN | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... -0.192502 | |
... ... ... ... | |
712506 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... 0.179834 | |
712507 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... ... -0.259912 | |
712508 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.059480 | |
712509 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... NaN | |
712510 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... -0.490807 | |
[712511 rows x 5 columns] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1715437541212520617 ... 1.919350 | |
1 1722158725807378589 ... -1.758969 | |
2 1724462438022554032 ... -0.960170 | |
3 1724471553906131352 ... -1.057418 | |
4 1733114336250380782 ... -1.813292 | |
... ... ... ... | |
1295049 1725550033179742375 ... -0.204101 | |
1295050 1806065703117832239 ... -0.217258 | |
1295051 1748887262912274653 ... -0.247808 | |
1295052 1737522200616263780 ... 0.873060 | |
1295053 1872611622075723818 ... -0.243832 | |
[1295054 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000011269AD6F327AED0F4086A732B4052F9D28E8791E1... ... -0.621737 | |
1 00003B703F86036C51F4F4B4C9F77B00C92D882421DA73... ... -0.169402 | |
2 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... -0.671606 | |
3 00004D45B2AFE9EA96333B280009DCC621851088264E8F... ... NaN | |
4 00005300B9017670433392BF6767238D54E058EC25D5C5... ... -0.199052 | |
... ... ... ... | |
712499 FFFFBBAB3C66ABB4DBC2A3B486C3C673345C89B5858465... ... 0.175126 | |
712500 FFFFC46B8555A97065DB39F7D600C8BB643F7F3EBD810E... ... -0.255379 | |
712501 FFFFC819886B2F837503D840D59EE8321A835AAF2B5C1E... ... -0.061838 | |
712502 FFFFD54D8094D7620A7C3E162F98198FBDBD3401A4F2FB... ... NaN | |
712503 FFFFFE8909485374E33854B934713713CAC93CDB50C9D0... ... -0.489205 | |
[712504 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489800, vs. num we are initializing: 712511 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489451, vs. num we are initializing: 712504 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 489800 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 489451 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489800, vs. num we are initializing: 712511 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489451, vs. num we are initializing: 712504 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 489800 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 489451 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489800, vs. num we are initializing: 712511 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489451, vs. num we are initializing: 712504 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 489800 | |
INFO:birdwatch.run_scoring:MFGroupScorer_9 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_9 run_scorer_parallelizable: Loading data elapsed time: 22.94 secs (0.38 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_9 set to: 4 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 489451 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_9. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1296988, vs. num we are initializing: 1295080 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1296961, vs. num we are initializing: 1295054 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1238892 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 58096 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 1238865 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 58096 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1296988, vs. num we are initializing: 1295080 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 1296961, vs. num we are initializing: 1295054 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1238892 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 58096 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 1238865 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 58096 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.959213 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=3.959929 | time=0.7s | |
INFO:birdwatch.run_scoring:MFGroupScorer_8 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_8 run_scorer_parallelizable: Loading data elapsed time: 24.21 secs (0.40 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_8 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_8. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10870382189750671 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08161847293376923 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.007132 | time=118.0s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 5553700 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Filter input elapsed time: 43.46 secs (0.72 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 4917877, Num Unique Notes Rated: 158637, Num Unique Raters: 52319 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Prepare ratings elapsed time: 2.51 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.439539 | time=46.2s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 28355 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 89428 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 30605 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 28355 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 3156203 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3156203 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.439297 | time=49.1s | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 28355, Notes: 158479 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.915591340177563 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 111.31028037383177 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 20.032298846923542 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.14103776216506958 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1035650297999382 | |
INFO:birdwatch.scorer: Ratings after group filter: 755067 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Filter input elapsed time: 40.37 secs (0.67 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 288153, Num Unique Notes Rated: 34270, Num Unique Raters: 5184 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Prepare ratings elapsed time: 0.22 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 2661 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 22616 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 2964 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2661 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 224951 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 224951 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2661, Notes: 34250 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 6.567912408759124 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 84.53626456219466 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 6.692715111528101 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15108999609947205 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1044110357761383 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09869233518838882 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06339141726493835 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09577610343694687 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.061669427901506424 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1004631370306015 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07102642953395844 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09539130330085754 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06174606829881668 | |
INFO:birdwatch.matrix_factorization:Num epochs: 74 | |
INFO:birdwatch.matrix_factorization:epoch 74 0.09534881263971329 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06189499795436859 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17209193110466003 | |
INFO:birdwatch.scorer:MFGroupScorer_8 Final helpfulness-filtered MF elapsed time: 1.71 secs (0.03 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_8 final scoring, about to call diligence with 224951 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1824145618043871451 -0.435419 2.247933 | |
1 1827427946220380250 -1.301155 1.366825 | |
2 1834946132486529335 -0.438711 1.173314 | |
3 1835060018384630219 -0.332141 2.292651 | |
4 1835787498271781185 -1.123652 0.906726 | |
... ... ... ... | |
33036 1739331011681407082 -0.181655 0.731855 | |
33037 1860064434762224022 -1.102765 -0.091826 | |
33038 1823212258971013310 -0.398532 0.269370 | |
33039 1775246560772636964 -0.326135 -0.838737 | |
33040 1746387003758002672 -0.426205 -0.665179 | |
internalNoteInterceptRound2 | |
0 -0.435419 | |
1 -1.301155 | |
2 -0.438711 | |
3 -0.332141 | |
4 -1.123652 | |
... ... | |
33036 -0.181655 | |
33037 -1.102765 | |
33038 -0.398532 | |
33039 -0.326135 | |
33040 -0.426205 | |
[33041 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 000332634A6A64C51BA706D66615B87D74D34B3465D3CD... | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
3 000A0CE0A7410288C107822B15D2B35C5E95715EA946E7... | |
4 00177CE102355982315EED42EADA601B04A6112E029004... | |
... ... | |
22611 FFE894CCE08EAD722CB39396FBE0AFC5E05C9C9B9E3721... | |
22612 FFEFEEF7E6B2DCB450856DBBB9F7EF303369C610B38A42... | |
22613 FFF32E6FDAD8CA20E1F78638046B1E3D95B838103AE629... | |
22614 FFF5A46494A3BDEC6FFF8A38A777E53484648B186FCD76... | |
22615 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
22611 NaN NaN NaN | |
22612 NaN NaN NaN | |
22613 NaN NaN NaN | |
22614 NaN NaN NaN | |
22615 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
22611 NaN | |
22612 NaN | |
22613 NaN | |
22614 NaN | |
22615 NaN | |
[22616 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2661, vs. num we are initializing: 22616 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2661 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2661, vs. num we are initializing: 22616 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2661 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2661, vs. num we are initializing: 22616 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 2661 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 34250, vs. num we are initializing: 33041 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 33212 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1038 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 34250, vs. num we are initializing: 33041 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 33212 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1038 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=8.746864 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.915614 | time=0.4s | |
INFO:birdwatch.matrix_factorization:Num epochs: 74 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.862045 | time=0.8s | |
INFO:birdwatch.matrix_factorization:epoch 74 0.10868552327156067 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.08159460127353668 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.853016 | time=1.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.851246 | time=1.5s | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 202.43 secs (3.37 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.850646 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.850376 | time=2.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=210 | loss=2.850243 | time=2.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=235 | loss=2.850182 | time=2.4s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2661, vs. num we are initializing: 22616 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2661 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.672999 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.520568 | time=0.3s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.518948 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=080 | loss=0.518872 | time=0.7s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.5189 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_8 Low Diligence Reputation Model elapsed time: 3.61 secs (0.06 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_8 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09801814705133438 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06872601062059402 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.30 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09777367860078812 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06855513155460358 | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.65 secs (0.04 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:Num epochs: 76 | |
INFO:birdwatch.matrix_factorization:epoch 76 0.09774463623762131 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06854161620140076 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17108562588691711 | |
INFO:birdwatch.scorer:MFGroupScorer_9 Final helpfulness-filtered MF elapsed time: 22.55 secs (0.38 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_9 final scoring, about to call diligence with 3156203 final round ratings. | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.81 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1642506152822079490 -2.279120 1.164171 | |
1 1644889840532566017 0.454213 2.501363 | |
2 1644890766915796992 -0.994993 1.546992 | |
3 1649616502188912641 -1.138480 0.109131 | |
4 1649621727880839168 -1.210915 0.852176 | |
... ... ... ... | |
157400 1769340856199221523 -0.265732 0.794307 | |
157401 1713350331864625656 1.330089 1.768102 | |
157402 1836274146642153547 -0.267058 -0.643837 | |
157403 1767562540320543193 -0.296659 -0.508060 | |
157404 1872611622075723818 -0.426384 -0.310856 | |
internalNoteInterceptRound2 | |
0 -2.279120 | |
1 0.454213 | |
2 -0.994993 | |
3 -1.138480 | |
4 -1.210915 | |
... ... | |
157400 -0.265732 | |
157401 1.330089 | |
157402 -0.267058 | |
157403 -0.296659 | |
157404 -0.426384 | |
[157405 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
3 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
4 0002D1E11A8EA1E4B25048FA9D117406CE9EB1D3143BC9... | |
... ... | |
89423 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
89424 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
89425 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... | |
89426 FFFEB3E291D915645E08FD13A9BFE66B5912FE45306D25... | |
89427 FFFF8C877BDC3CEFEFD0D4C5F0E8B4BE537F5023A1F31F... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 -0.888680 -1.365593 0.286359 | |
4 NaN NaN NaN | |
... ... ... ... | |
89423 0.137015 0.673040 0.569525 | |
89424 NaN NaN NaN | |
89425 NaN NaN NaN | |
89426 0.126170 -0.555323 0.160850 | |
89427 0.300415 -0.510434 0.437841 | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 -0.888680 | |
4 NaN | |
... ... | |
89423 0.137015 | |
89424 NaN | |
89425 NaN | |
89426 0.126170 | |
89427 0.300415 | |
[89428 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28355, vs. num we are initializing: 89428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 28355 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.09 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28355, vs. num we are initializing: 89428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 28355 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28355, vs. num we are initializing: 89428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 28355 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 158479, vs. num we are initializing: 157405 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 154362 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 4117 | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.73 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 158479, vs. num we are initializing: 157405 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 154362 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 4117 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=4.820239 | time=0.1s | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 377369, 'internalRaterIntercept': -0.48738948, 'internalRaterFactor1': -1.1244862, 'helpfulNum': 1.0} | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1748247 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 445 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 311 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.310171 | time=2.9s | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.11 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.79 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377370, Notes: 1205353 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.93 secs (0.02 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.282616 | time=5.9s | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.64 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 72 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.16270163655281067 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1244073361158371 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.98 secs (0.05 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.280400 | time=9.2s | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.68 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=115 | loss=2.280273 | time=11.6s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 28355, vs. num we are initializing: 89428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 28355 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.355155 | time=0.1s | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 6577 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.98 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.64 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 84 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.85 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.53 secs (0.03 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.339918 | time=3.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.419287 | time=92.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=055 | loss=0.339748 | time=5.8s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3397 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_9 Low Diligence Reputation Model elapsed time: 21.87 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_9 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.007131 | time=184.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=0.007131 | time=184.7s | |
INFO:birdwatch.helpfulness_model:Helpfulness reputation loss: 0.0071 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/reputation_scorer.py, in _score_notes_and_users, at line 187: noteStats = noteStats.merge(noteStatusHistory[[c.noteIdKey]].drop_duplicates(), how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.scorer:Postprocessing output for ReputationScorer | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.419047 | time=95.6s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 9.35 secs (0.16 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 7.41 secs (0.12 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.80 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.60 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 1.01 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.67 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1744475 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 12041 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 6304 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.17 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.87 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.98 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.62 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 40.16 secs (0.67 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_8 Final compute scored notes elapsed time: 70.34 secs (1.17 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_8 | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 2056 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.91 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.54 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.418124 | time=135.3s | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 31106 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.88 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.58 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.417884 | time=137.4s | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 46 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.83 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.52 secs (0.03 mins) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.13378728926181793 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10396137833595276 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 1394865 | |
INFO:birdwatch.scorer:ReputationScorer Postprocess output elapsed time: 54.80 secs (0.91 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=2.418079 | time=157.5s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:epoch=105 | loss=2.417839 | time=158.7s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489451, vs. num we are initializing: 712504 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 489800, vs. num we are initializing: 712511 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 489451 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 489800 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.257332 | time=0.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.257032 | time=0.6s | |
INFO:birdwatch.run_scoring:MFGroupScorer_7 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_7 run_scorer_parallelizable: Loading data elapsed time: 24.00 secs (0.40 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_7 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_7. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.44 secs (0.66 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_9 Final compute scored notes elapsed time: 85.50 secs (1.43 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_9 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 799 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_8 Postprocess output elapsed time: 59.28 secs (0.99 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.matrix_factorization:Num epochs: 40 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13263684511184692 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10269863158464432 | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 114.47 secs (1.91 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.250835 | time=41.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.251126 | time=42.3s | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 377370, 'internalRaterIntercept': -0.48738948, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0} | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377370, Notes: 1205353 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.run_scoring:MFGroupScorer_6 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_6 run_scorer_parallelizable: Loading data elapsed time: 23.11 secs (0.39 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_6 set to: 4 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_6. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.16573211550712585 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12750285863876343 | |
INFO:birdwatch.scorer: Ratings after group filter: 1747969 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Filter input elapsed time: 44.55 secs (0.74 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1238501, Num Unique Notes Rated: 81253, Num Unique Raters: 29029 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Prepare ratings elapsed time: 0.63 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.reputation_matrix_factorization:epoch=045 | loss=0.250785 | time=60.8s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2508 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.reputation_matrix_factorization:epoch=045 | loss=0.251077 | time=62.1s | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 12367 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 56374 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 14661 | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.2511 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFExpansionPlusScorer Low Diligence Reputation Model elapsed time: 312.24 secs (5.20 mins) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 12367 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 876487 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 876487 | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFExpansionPlusScorer | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 12367, Notes: 81195 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 10.79483958371821 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 70.87304924395569 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 10.970841516845445 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1506255865097046 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10812779515981674 | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFExpansionScorer Low Diligence Reputation Model elapsed time: 312.15 secs (5.20 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFExpansionScorer | |
INFO:birdwatch.matrix_factorization:epoch 20 0.11070583015680313 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07883638888597488 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10865917056798935 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07696988433599472 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10841184109449387 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07680679112672806 | |
INFO:birdwatch.matrix_factorization:Num epochs: 71 | |
INFO:birdwatch.matrix_factorization:epoch 71 0.10838818550109863 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07678954303264618 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1820950210094452 | |
INFO:birdwatch.scorer:MFGroupScorer_7 Final helpfulness-filtered MF elapsed time: 5.17 secs (0.09 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_7 final scoring, about to call diligence with 876487 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId ... internalNoteInterceptRound2 | |
0 1783356907102572640 ... 0.208783 | |
1 1817651698652926356 ... -1.930698 | |
2 1819455029834555469 ... -2.641512 | |
3 1819460608976118232 ... 2.181850 | |
4 1832658971833806987 ... 0.118507 | |
... ... ... ... | |
79423 1719110901046133082 ... -0.284150 | |
79424 1761711989527650551 ... -0.413572 | |
79425 1765805801992528269 ... -0.413611 | |
79426 1739385025395569106 ... 0.563089 | |
79427 1774629591619096646 ... -0.333901 | |
[79428 rows x 4 columns], | |
raterInitState: | |
raterParticipantId ... internalRaterInterceptRound2 | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... ... NaN | |
1 0001C21FD89AC65310D4D74174C0986CDF457DA24DADAB... ... 0.075967 | |
2 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... ... NaN | |
3 0003E67BB62E658363186A00B13637CF1A58748C4E4ECE... ... -0.153998 | |
4 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... ... NaN | |
... ... ... ... | |
56369 FFF7636C99E1370B663778061CD0AF5458555FDA579F88... ... NaN | |
56370 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... ... NaN | |
56371 FFFBC05DB8408BB532985642C4DE00EC619B062CB60E2E... ... -0.064314 | |
56372 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... ... NaN | |
56373 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... ... NaN | |
[56374 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12367, vs. num we are initializing: 56374 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 12367 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12367, vs. num we are initializing: 56374 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 12367 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12367, vs. num we are initializing: 56374 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 12367 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 81195, vs. num we are initializing: 79428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 79242 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1953 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 81195, vs. num we are initializing: 79428 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 79242 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1953 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.754270 | time=0.0s | |
INFO:birdwatch.scorer: Final noteScores length: 47624 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.879617 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.847818 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.844625 | time=2.3s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_9 Postprocess output elapsed time: 54.36 secs (0.91 mins) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.844300 | time=3.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=140 | loss=2.844249 | time=3.5s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 12367, vs. num we are initializing: 56374 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 12367 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.540329 | time=0.0s | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.474979 | time=0.8s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.473973 | time=1.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.473939 | time=1.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4739 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_7 Low Diligence Reputation Model elapsed time: 7.01 secs (0.12 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_7 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 3.78 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.25 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.76 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.59 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1745208 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 4582 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 2443 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.98 secs (0.05 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_5 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_5 run_scorer_parallelizable: Loading data elapsed time: 22.14 secs (0.37 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_5 set to: 4 | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.61 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_5. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.scorer: Ratings after group filter: 5450867 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Filter input elapsed time: 42.99 secs (0.72 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.60 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 4699227, Num Unique Notes Rated: 211221, Num Unique Raters: 39569 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Prepare ratings elapsed time: 2.44 secs (0.04 mins) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 698 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.78 secs (0.05 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.42 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 21925 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 94982 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 23379 | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 19513 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.86 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.49 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 21925 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 2990395 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 2990395 | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 90 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.78 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.41 secs (0.02 mins) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 21925, Notes: 210221 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 14.225006065045832 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 136.39201824401368 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 14.296365255016083 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.127536803483963 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.09117355197668076 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.13710755109786987 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10831429809331894 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.0995996817946434 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06900053471326828 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09801977872848511 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06795614212751389 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09782855212688446 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06773346662521362 | |
INFO:birdwatch.matrix_factorization:Num epochs: 79 | |
INFO:birdwatch.matrix_factorization:epoch 79 0.09780332446098328 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06769677996635437 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.1703343689441681 | |
INFO:birdwatch.scorer:MFGroupScorer_6 Final helpfulness-filtered MF elapsed time: 21.15 secs (0.35 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_6 final scoring, about to call diligence with 2990395 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1699159156060475887 -2.116670 2.276408 | |
1 1708036258310607099 -1.420304 2.767993 | |
2 1708634843616248157 -0.909520 2.826810 | |
3 1708698407043252372 2.160592 0.463363 | |
4 1708722796358963422 -1.104113 3.109181 | |
... ... ... ... | |
208923 1783304337789219115 -0.282226 0.765389 | |
208924 1831341282007945725 -0.282333 0.765389 | |
208925 1831414634940621056 -0.282371 0.765412 | |
208926 1872283842301583792 1.119859 -2.117167 | |
208927 1872298791140782443 -0.331062 0.838955 | |
internalNoteInterceptRound2 | |
0 -2.116670 | |
1 -1.420304 | |
2 -0.909520 | |
3 2.160592 | |
4 -1.104113 | |
... ... | |
208923 -0.282226 | |
208924 -0.282333 | |
208925 -0.282371 | |
208926 1.119859 | |
208927 -0.331062 | |
[208928 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00018DBB934257251EBCEE91D0722C71B7DD592A571398... | |
2 0002188E5ED3028646C97CBE9ADCD12CB5B8BFAF8819BD... | |
3 0002725E706CF18C040E21F30CE2D39994513C3BB8CF58... | |
4 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
... ... | |
94977 FFFDAB98EE31EC0CC51169937F859D5B676870C6470C19... | |
94978 FFFEB058BCC25277E2662DD3E8C0506FB1B23BA4D965EA... | |
94979 FFFEB27D6E27351D14EB43777F265F694744ABB4B3B7AD... | |
94980 FFFF0C7BF4089C6436CAB332B309A1A81C21E11CD61CE4... | |
94981 FFFFAB2FDBC1968F4CFE97A86D88963D702B636365B6CD... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 -0.007653 -2.017097 0.187599 | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
94977 NaN NaN NaN | |
94978 NaN NaN NaN | |
94979 NaN NaN NaN | |
94980 NaN NaN NaN | |
94981 -0.153286 -0.331380 0.446786 | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 -0.007653 | |
3 NaN | |
4 NaN | |
... ... | |
94977 NaN | |
94978 NaN | |
94979 NaN | |
94980 NaN | |
94981 -0.153286 | |
[94982 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 21925, vs. num we are initializing: 94982 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 21925 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 21925, vs. num we are initializing: 94982 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 21925 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 21925, vs. num we are initializing: 94982 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 21925 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 210221, vs. num we are initializing: 208928 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 203877 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 6344 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 210221, vs. num we are initializing: 208928 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 203877 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 6344 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.512455 | time=0.0s | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.596907 | time=2.6s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.565194 | time=5.2s | |
INFO:birdwatch.scorer: Ratings after group filter: 555225 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Filter input elapsed time: 41.13 secs (0.69 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 251048, Num Unique Notes Rated: 23240, Num Unique Raters: 6129 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Prepare ratings elapsed time: 0.17 secs (0.00 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 2990 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 16995 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 3458 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 2990 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 187149 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 187149 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 2990, Notes: 23224 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 8.058430933517052 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 62.591638795986626 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 8.295422773393462 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.15475018322467804 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10480927675962448 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09636855125427246 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05821274593472481 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09162192791700363 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05588897317647934 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09138025343418121 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.056005384773015976 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09134545177221298 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05613483488559723 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.562311 | time=7.7s | |
INFO:birdwatch.matrix_factorization:Num epochs: 94 | |
INFO:birdwatch.matrix_factorization:epoch 94 0.09133845567703247 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.05623766407370567 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.18524184823036194 | |
INFO:birdwatch.scorer:MFGroupScorer_5 Final helpfulness-filtered MF elapsed time: 1.46 secs (0.02 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_5 final scoring, about to call diligence with 187149 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1685671586295099392 -0.572634 -2.337576 | |
1 1694265338060222629 -3.497196 0.750307 | |
2 1708971742826213665 -1.419062 0.355426 | |
3 1709029742886760650 -1.307871 -0.335284 | |
4 1710469668035801230 -1.403838 0.487615 | |
... ... ... ... | |
22170 1853499093554770105 -0.327273 -0.712118 | |
22171 1779889399464935794 0.229670 -1.031546 | |
22172 1821313194553700446 0.234071 -1.033639 | |
22173 1821335118532731025 0.231755 -1.033731 | |
22174 1854148348426547432 -0.449657 1.033040 | |
internalNoteInterceptRound2 | |
0 -0.572634 | |
1 -3.497196 | |
2 -1.419062 | |
3 -1.307871 | |
4 -1.403838 | |
... ... | |
22170 -0.327273 | |
22171 0.229670 | |
22172 0.234071 | |
22173 0.231755 | |
22174 -0.449657 | |
[22175 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
1 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
2 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
3 000F1687C56AB92D846F2B9BFA71AE16D8A88426754E3B... | |
4 0011AB5425173F62E5D4A1787E34ED324BDD5807D4C3B8... | |
... ... | |
16990 FFDC71F0AE061FDEC1E553DBEADDD7EFBD520C6EA87C6F... | |
16991 FFE87CF4860C52665B228E9F345BB3EE183994416FA6D7... | |
16992 FFEA6CF8956CF5972B2086A17F147FCC0B59CBD4CE0C7E... | |
16993 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... | |
16994 FFF6DBEDE9ED4DC6A61291E33742D1805155E385475E43... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
16990 NaN NaN NaN | |
16991 NaN NaN NaN | |
16992 NaN NaN NaN | |
16993 NaN NaN NaN | |
16994 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
16990 NaN | |
16991 NaN | |
16992 NaN | |
16993 NaN | |
16994 NaN | |
[16995 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2990, vs. num we are initializing: 16995 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 2990 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2990, vs. num we are initializing: 16995 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2990 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2990, vs. num we are initializing: 16995 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 2990 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 23224, vs. num we are initializing: 22175 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 22727 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 497 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 23224, vs. num we are initializing: 22175 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 22727 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 497 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=7.216479 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.579757 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.539598 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.534247 | time=0.5s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.533572 | time=0.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=150 | loss=2.533401 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.533329 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=180 | loss=2.533329 | time=1.1s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 2990, vs. num we are initializing: 16995 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 2990 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.586357 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.497570 | time=0.2s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.495975 | time=0.4s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=075 | loss=0.495925 | time=0.4s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.4959 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_5 Low Diligence Reputation Model elapsed time: 1.89 secs (0.03 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_5 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=120 | loss=2.562075 | time=10.3s | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=130 | loss=2.562057 | time=11.1s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 21925, vs. num we are initializing: 94982 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 21925 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.430606 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.386664 | time=2.6s | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 40.42 secs (0.67 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 2.18 secs (0.04 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
INFO:birdwatch.scorer:MFGroupScorer_7 Final compute scored notes elapsed time: 72.60 secs (1.21 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_7 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386047 | time=5.1s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=070 | loss=0.386029 | time=5.9s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3860 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_6 Low Diligence Reputation Model elapsed time: 21.85 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_6 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 2.34 secs (0.04 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.54 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.97 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.53 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1749131 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 279 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 188 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.84 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.45 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.66 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.89 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 71 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.69 secs (0.04 mins) | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13590402901172638 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10660792887210846 | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.37 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 4749 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.70 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.29 secs (0.05 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 18 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.73 secs (0.01 mins) | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 8.77 secs (0.15 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.28 secs (0.02 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 7.10 secs (0.12 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.13 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.03 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.67 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.63 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.95 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.59 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1738088 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 9443 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 5436 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 3.01 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.66 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.11 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.77 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.99 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.65 secs (0.03 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 1489 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.82 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.46 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 41610 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.80 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.47 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 90 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.82 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.46 secs (0.02 mins) | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 8659 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_7 Postprocess output elapsed time: 55.39 secs (0.92 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 36.91 secs (0.62 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_5 Final compute scored notes elapsed time: 63.91 secs (1.07 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_5 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:Num epochs: 60 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.13572043180465698 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10619454830884933 | |
INFO:birdwatch.constants:Pseudo: fit all notes with raters constant elapsed time: 162.26 secs (2.70 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_4 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_4 run_scorer_parallelizable: Loading data elapsed time: 21.56 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_4 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_4. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.pseudo_raters:------------------ | |
INFO:birdwatch.pseudo_raters:Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 377371, 'internalRaterIntercept': -0.48738948, 'internalRaterFactor1': 1.0582559, 'helpfulNum': 1.0} | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 377370, Notes: 1205353 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INIT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/pseudo_raters.py, in _check_note_parameters_same, at line 90: assert (noteParamsFromNewModel == self.noteParams).all().all() | |
PandasTypeError: Type expectation mismatch on noteId: found=bool expected=int64 | |
INFO:birdwatch.matrix_factorization:epoch 0 0.16483864188194275 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.1265230029821396 | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 39.86 secs (0.66 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', | |
'notHelpfulOpinionSpeculationOrBias', 'notHelpfulMissingKeyPoints', | |
'notHelpfulOutdated', 'notHelpfulHardToUnderstand', | |
'notHelpfulArgumentativeOrBiased', 'notHelpfulOffTopic', | |
'notHelpfulSpamHarassmentOrAbuse', 'notHelpfulIrrelevantSources', | |
'notHelpfulOpinionSpeculation', 'notHelpfulNoteNotNeeded', 'numRatings', | |
'noteAuthorParticipantId', 'classification', 'currentStatus', | |
'internalNoteIntercept', 'internalNoteFactor1', | |
'lowDiligenceNoteIntercept', 'internalNoteFactor1_max', | |
'internalNoteFactor1_median', 'internalNoteFactor1_min', | |
'internalNoteFactor1_refit_orig', 'internalNoteIntercept_median', | |
'internalNoteIntercept_refit_orig', 'ratingCount_all', | |
'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'notHelpfulIncorrect_interval', 'p_incorrect_user_interval', | |
'num_voters_interval', 'tf_idf_incorrect_interval', | |
'internalRatingStatus', 'internalActiveRules', 'activeFilterTags', | |
'crhBool', 'crnhBool', 'awaitingBool'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_6 Final compute scored notes elapsed time: 85.25 secs (1.42 mins) | |
INFO:birdwatch.scorer:Postprocessing output for MFGroupScorer_6 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 3949 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_5 Postprocess output elapsed time: 50.14 secs (0.84 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 1911572 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Filter input elapsed time: 39.07 secs (0.65 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 1529367, Num Unique Notes Rated: 60721, Num Unique Raters: 19245 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Prepare ratings elapsed time: 0.64 secs (0.01 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 9878 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 38413 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 10669 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 9878 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 968084 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 968084 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 9878, Notes: 60651 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 15.961550510296615 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 98.0040494027131 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 16.114954798678987 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.14834263920783997 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.12264454364776611 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.09764505922794342 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06766458600759506 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.09427738934755325 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06417444348335266 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.09402671456336975 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06379815936088562 | |
INFO:birdwatch.matrix_factorization:epoch 80 0.09399324655532837 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06373821943998337 | |
INFO:birdwatch.matrix_factorization:Num epochs: 94 | |
INFO:birdwatch.matrix_factorization:epoch 94 0.09398777782917023 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.06372278183698654 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.16988325119018555 | |
INFO:birdwatch.scorer:MFGroupScorer_4 Final helpfulness-filtered MF elapsed time: 7.94 secs (0.13 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_4 final scoring, about to call diligence with 968084 final round ratings. | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:467: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
noteInitState[c.internalNoteInterceptKey] = noteInitState[c.internalNoteInterceptRound2Key] | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:470: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = raterInitState[c.internalRaterInterceptRound2Key] | |
INFO:birdwatch.reputation_matrix_factorization:Setup model: noteInitState: | |
noteId internalNoteIntercept internalNoteFactor1 \ | |
0 1682453260345528328 -2.105245 2.579850 | |
1 1708988303926862198 1.432146 0.505378 | |
2 1711418483634741262 -0.973364 1.666879 | |
3 1711435360398356934 -0.364056 1.728364 | |
4 1711499136887931319 -2.191805 0.143735 | |
... ... ... ... | |
59949 1820850280050618650 0.339240 1.049319 | |
59950 1820850814115557481 0.339505 1.045633 | |
59951 1806138066056404994 -0.535415 -0.787759 | |
59952 1790301132448833854 -0.179646 0.683219 | |
59953 1704681040659521639 -0.284608 -0.807413 | |
internalNoteInterceptRound2 | |
0 -2.105245 | |
1 1.432146 | |
2 -0.973364 | |
3 -0.364056 | |
4 -2.191805 | |
... ... | |
59949 0.339240 | |
59950 0.339505 | |
59951 -0.535415 | |
59952 -0.179646 | |
59953 -0.284608 | |
[59954 rows x 4 columns], | |
raterInitState: | |
raterParticipantId \ | |
0 000045A5FA0CF004F68CBF2913506C37D540CF48522D33... | |
1 00029D1FDD352D79B5073189C3F2BDF6377581F50D66C1... | |
2 00053CDCAC04E3692F4A01305C8F3D093CCE221157D539... | |
3 0005983E6E18862483AB372C5B61FEBC1F8A573E7701F9... | |
4 000C92F6B8127DF83BE8430A54BCA7ECF08071EC8E00E2... | |
... ... | |
38408 FFF3E935633C6870DE7674D0681C5821BC408073C84A36... | |
38409 FFF6DBEDE9ED4DC6A61291E33742D1805155E385475E43... | |
38410 FFF89590FF300D0348631F2F16AA908F663A888A3F82E0... | |
38411 FFFA43EFB0AAB3BFD273666FF123BFE69D863B9A2F5E44... | |
38412 FFFC011F23086D8153F0A3FF336F33EE80521EC35F9ACD... | |
internalRaterIntercept internalRaterFactor1 internalRaterReputation \ | |
0 NaN NaN NaN | |
1 NaN NaN NaN | |
2 NaN NaN NaN | |
3 NaN NaN NaN | |
4 NaN NaN NaN | |
... ... ... ... | |
38408 NaN NaN NaN | |
38409 NaN NaN NaN | |
38410 0.312969 -0.461409 0.532296 | |
38411 NaN NaN NaN | |
38412 NaN NaN NaN | |
internalRaterInterceptRound2 | |
0 NaN | |
1 NaN | |
2 NaN | |
3 NaN | |
4 NaN | |
... ... | |
38408 NaN | |
38409 NaN | |
38410 0.312969 | |
38411 NaN | |
38412 NaN | |
[38413 rows x 5 columns] | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 9878, vs. num we are initializing: 38413 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterFactor1s: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterFactor1s: 9878 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 9878, vs. num we are initializing: 38413 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 9878 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterReputation: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 9878, vs. num we are initializing: 38413 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterReputations: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterReputations: 9878 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteFactor1: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 60651, vs. num we are initializing: 59954 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteFactor1s: 59452 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteFactor1s: 1199 | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalNoteIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 60651, vs. num we are initializing: 59954 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalNoteIntercepts: 59452 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalNoteIntercepts: 1199 | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, initial round fitting reputation MF (equivalent to Round 2 in Prescoring - learn note factor) | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=5.006903 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=2.464097 | time=1.0s | |
INFO:birdwatch.run_scoring:MFGroupScorer_3 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_3 run_scorer_parallelizable: Loading data elapsed time: 21.76 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_3 set to: 4 | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_3. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=2.436134 | time=1.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=090 | loss=2.434052 | time=2.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=115 | loss=2.433925 | time=3.4s | |
INFO:birdwatch.reputation_matrix_factorization:Final scoring, final round fitting reputation MF: learn just note intercept | |
/home/ubuntu/communitynotes/sourcecode/scoring/reputation_matrix_factorization/reputation_matrix_factorization.py:505: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
raterInitState[c.internalRaterInterceptKey] = savedFinalRoundPrescoringRaterIntercept | |
INFO:birdwatch.reputation_matrix_factorization:Initializing internalRaterIntercept: | |
INFO:birdwatch.reputation_matrix_factorization: num in dataset: 9878, vs. num we are initializing: 38413 | |
INFO:birdwatch.reputation_matrix_factorization: uninitialized internalRaterIntercepts: 0 | |
INFO:birdwatch.reputation_matrix_factorization: initialized internalRaterIntercepts: 9878 | |
INFO:birdwatch.reputation_matrix_factorization:epoch=000 | loss=0.407297 | time=0.0s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=030 | loss=0.386404 | time=0.9s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386157 | time=1.7s | |
INFO:birdwatch.reputation_matrix_factorization:epoch=060 | loss=0.386157 | time=1.7s | |
INFO:birdwatch.diligence_model:Low diligence final loss: 0.3862 | |
INFO:birdwatch.mf_base_scorer:diligenceNP cols: Index(['noteId', 'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], dtype='object') | |
INFO:birdwatch.mf_base_scorer:np cols: Index(['noteId', 'noteIndex', 'internalNoteIntercept', 'internalNoteFactor1', | |
'internalNoteFactor1_max', 'internalNoteFactor1_median', | |
'internalNoteFactor1_min', 'internalNoteFactor1_refit_orig', | |
'internalNoteIntercept_median', 'internalNoteIntercept_refit_orig', | |
'ratingCount_all', 'ratingCount_neg_fac', 'ratingCount_pos_fac', | |
'internalNoteIntercept_max', 'internalNoteIntercept_min', | |
'lowDiligenceNoteIntercept', 'lowDiligenceNoteFactor1'], | |
dtype='object') | |
INFO:birdwatch.scorer:MFGroupScorer_4 Low Diligence Reputation Model elapsed time: 6.52 secs (0.11 mins) | |
INFO:birdwatch.mf_base_scorer:About to call compute_scored_notes with MFGroupScorer_4 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_note_stats, at line 322: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on numRatings: result=float64 expected=int64 (allowed) | |
PandasTypeError: Output mismatch on numRatingsLast28: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.matrix_factorization:epoch 20 0.1349051296710968 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10503225773572922 | |
INFO:birdwatch.scorer: Original noteScores length: 1750506 | |
INFO:birdwatch.scorer: Final noteScores length: 36318 | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/mf_base_scorer.py, in score_final, at line 1190: noteScores = noteScores.merge( | |
PandasTypeError: Output mismatch on numFinalRoundRatings: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.scorer:MFGroupScorer_6 Postprocess output elapsed time: 55.18 secs (0.92 mins) | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just started in parallel: loading data from shared memory. | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 494: noteStats = tagAggregates.merge(noteStats, on=c.noteIdKey, how="outer") | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:compute_scored_notes: compute tag aggregates elapsed time: 3.57 secs (0.06 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:82: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals.drop(columns_to_drop, inplace=True, axis=1) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:84: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ( | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:90: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:91: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRateByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:94: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:95: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.incorrectTagRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:98: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey].fillna(0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/incorrect_filter.py:99: SettingWithCopyWarning: | |
A value is trying to be set on a copy of a slice from a DataFrame. | |
Try using .loc[row_indexer,col_indexer] = value instead | |
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy | |
ratings_w_user_totals[c.totalRatingsMadeByRaterKey] = ratings_w_user_totals[ | |
MERGE ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/note_ratings.py, in compute_scored_notes, at line 499: noteStats = noteStats.merge( | |
PandasTypeError: Output mismatch on num_voters_interval: result=float64 expected=int64 (allowed) | |
INFO:birdwatch.constants:compute_scored_notes: compute incorrect aggregates elapsed time: 3.50 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: InitialNMR (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: InitialNMR (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: InitialNMR (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRH (v1.0) elapsed time: 0.12 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRH (v1.0) elapsed time: 0.75 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRNH (v1.0) elapsed time: 0.02 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRNH (v1.0) elapsed time: 0.62 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: UcbCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: UcbCRNH (v1.0) elapsed time: 0.01 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: UcbCRNH (v1.0) elapsed time: 0.61 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: NmCRNH (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: NmCRNH (v1.0) elapsed time: 0.09 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: NmCRNH (v1.0) elapsed time: 0.72 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: GeneralCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: GeneralCRHInertia (v1.0) elapsed time: 0.92 secs (0.02 mins) | |
INFO:birdwatch.constants:Applying scoring rule: GeneralCRHInertia (v1.0) elapsed time: 1.48 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: TagFilter (v1.0) | |
INFO:birdwatch.scoring_rules:Candidate notes prior to tag filtering: 1747373 | |
INFO:birdwatch.scoring_rules:Checking note tags: | |
INFO:birdwatch.scoring_rules:notHelpfulOther | |
INFO:birdwatch.scoring_rules:notHelpfulIncorrect | |
INFO:birdwatch.scoring_rules:notHelpfulSourcesMissingOrUnreliable | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculationOrBias | |
INFO:birdwatch.scoring_rules:notHelpfulMissingKeyPoints | |
INFO:birdwatch.scoring_rules:notHelpfulOutdated | |
INFO:birdwatch.scoring_rules:notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:outlier filtering disabled for tag: notHelpfulHardToUnderstand | |
INFO:birdwatch.scoring_rules:notHelpfulArgumentativeOrBiased | |
INFO:birdwatch.scoring_rules:notHelpfulOffTopic | |
INFO:birdwatch.scoring_rules:notHelpfulSpamHarassmentOrAbuse | |
INFO:birdwatch.scoring_rules:notHelpfulIrrelevantSources | |
INFO:birdwatch.scoring_rules:notHelpfulOpinionSpeculation | |
INFO:birdwatch.scoring_rules:notHelpfulNoteNotNeeded | |
INFO:birdwatch.scoring_rules:Total {note, tag} pairs where tag filter logic triggered: 3905 | |
INFO:birdwatch.scoring_rules:Total unique notes impacted by tag filtering: 2092 | |
INFO:birdwatch.constants:Calling score_notes: TagFilter (v1.0) elapsed time: 2.83 secs (0.05 mins) | |
CONCAT ERROR(S) AT: /home/ubuntu/communitynotes/sourcecode/scoring/scoring_rules.py, in apply_scoring_rules, at line 1099: noteColumns = noteColumns.merge( | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
PandasTypeError: Type expectation mismatch on noteId: found=float64 expected=int64 | |
INFO:birdwatch.constants:Applying scoring rule: TagFilter (v1.0) elapsed time: 3.43 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: CRHSuperThreshold (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: CRHSuperThreshold (v1.0) elapsed time: 0.10 secs (0.00 mins) | |
INFO:birdwatch.constants:Applying scoring rule: CRHSuperThreshold (v1.0) elapsed time: 0.69 secs (0.01 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: ElevatedCRHInertia (v1.0) | |
INFO:birdwatch.constants:Calling score_notes: ElevatedCRHInertia (v1.0) elapsed time: 0.88 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: ElevatedCRHInertia (v1.0) elapsed time: 1.46 secs (0.02 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterIncorrect (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by incorrect filtering: 531 | |
INFO:birdwatch.constants:Calling score_notes: FilterIncorrect (v1.0) elapsed time: 2.77 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterIncorrect (v1.0) elapsed time: 3.37 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLowDiligence (v1.0) | |
INFO:birdwatch.scoring_rules:Total notes impacted by low diligence filtering: 12369 | |
INFO:birdwatch.constants:Calling score_notes: FilterLowDiligence (v1.0) elapsed time: 2.71 secs (0.05 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLowDiligence (v1.0) elapsed time: 3.30 secs (0.06 mins) | |
INFO:birdwatch.scoring_rules:Applying scoring rule: FilterLargeFactor (v1.0) | |
INFO:birdwatch.run_scoring:MFGroupScorer_2 run_scorer_parallelizable just finished loading data from shared memory. | |
INFO:birdwatch.constants:MFGroupScorer_2 run_scorer_parallelizable: Loading data elapsed time: 21.42 secs (0.36 mins) | |
INFO:birdwatch.mf_base_scorer:score_final: Torch intra-op parallelism for MFGroupScorer_2 set to: 4 | |
INFO:birdwatch.scoring_rules:Total notes impacted by large factor filtering: 21 | |
INFO:birdwatch.constants:Calling score_notes: FilterLargeFactor (v1.0) elapsed time: 0.74 secs (0.01 mins) | |
INFO:birdwatch.constants:Applying scoring rule: FilterLargeFactor (v1.0) elapsed time: 1.32 secs (0.02 mins) | |
INFO:birdwatch.scorer:Filtering ratings for MFGroupScorer_2. Original rating length: 118317340 | |
INFO:birdwatch.scorer: Ratings after topic filter: 118317340 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.scorer: Ratings after group filter: 6154771 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Filter input elapsed time: 42.65 secs (0.71 mins) | |
INFO:birdwatch.mf_base_scorer:seeding with 0 | |
INFO:birdwatch.process_data:After applying min 0 ratings per rater and min 5 raters per note: | |
Num Ratings: 5512042, Num Unique Notes Rated: 169270, Num Unique Raters: 67680 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Prepare ratings elapsed time: 2.56 secs (0.04 mins) | |
/home/ubuntu/communitynotes/sourcecode/scoring/helpfulness_scores.py:221: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` | |
helpfulnessScores[c.aboveHelpfulnessThresholdKey].fillna(False), [c.raterParticipantIdKey] | |
INFO:birdwatch.helpfulness_scores:Unique Raters: 34478 | |
INFO:birdwatch.helpfulness_scores:People (Authors or Raters) With Helpfulness Scores: 101066 | |
INFO:birdwatch.helpfulness_scores:Raters Included Based on Helpfulness Scores: 38195 | |
INFO:birdwatch.helpfulness_scores:Included Raters who have rated at least 1 note in the final dataset: 34478 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings Used For 1st Training: 3236790 | |
INFO:birdwatch.helpfulness_scores:Number of Ratings for Final Training: 3236790 | |
INFO:birdwatch.matrix_factorization:------------------ | |
INFO:birdwatch.matrix_factorization:Users: 34478, Notes: 168525 | |
INFO:birdwatch.matrix_factorization:initializing notes | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:187: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.internalNoteInterceptKey].fillna(0.0, inplace=True) | |
/home/ubuntu/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py:193: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. | |
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. | |
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. | |
noteInit[c.note_factor_key(i)].fillna(0.0, inplace=True) | |
INFO:birdwatch.matrix_factorization:initializing users | |
INFO:birdwatch.matrix_factorization:initialized global intercept | |
INFO:birdwatch.matrix_factorization:learning rate set to :0.2 | |
INFO:birdwatch.matrix_factorization:cpu | |
INFO:birdwatch.matrix_factorization:Ratings per note in dataset: 19.206586559857588 | |
INFO:birdwatch.matrix_factorization:Ratings per user in dataset: 93.87986542142816 | |
INFO:birdwatch.matrix_factorization:Correcting loss function to simulate rating per note loss ratio = 19.308334776093563 | |
INFO:birdwatch.model:Freezing parameter: user_factors.weight | |
INFO:birdwatch.model:Freezing parameter: user_intercepts.weight | |
INFO:birdwatch.model:Freezing parameter: global_intercept | |
INFO:birdwatch.matrix_factorization:epoch 0 0.1374596357345581 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10155127942562103 | |
INFO:birdwatch.matrix_factorization:epoch 20 0.10550785064697266 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.0763513594865799 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.13380275666713715 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.10398941487073898 | |
INFO:birdwatch.matrix_factorization:epoch 40 0.10374687612056732 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07503622770309448 | |
INFO:birdwatch.scorer: Ratings without assigned group: 0 | |
INFO:birdwatch.matrix_factorization:epoch 60 0.10353998094797134 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07484707236289978 | |
INFO:birdwatch.matrix_factorization:Num epochs: 61 | |
INFO:birdwatch.matrix_factorization:epoch 61 0.10353998094797134 | |
INFO:birdwatch.matrix_factorization:TRAIN FIT LOSS: 0.07484707236289978 | |
INFO:birdwatch.matrix_factorization:Global Intercept: 0.17145097255706787 | |
INFO:birdwatch.scorer:MFGroupScorer_3 Final helpfulness-filtered MF elapsed time: 19.26 secs (0.32 mins) | |
INFO:birdwatch.mf_base_scorer:In MFGroupScorer_3 final scoring, about to call diligence with 3236790 final round ratings. | |
INFO:birdwatch.constants:Condense noteRules after applying all scoring rules elapsed time: 37.98 secs (0.63 mins) | |
INFO:birdwatch.mf_base_scorer:sn cols: Index(['noteId', 'ratingWeight', 'notHelpfulOtherAdjusted', | |
'notHelpfulIncorrectAdjusted', | |
'notHelpfulSourcesMissingOrUnreliableAdjusted', | |
'notHelpfulOpinionSpeculationOrBiasAdjusted', | |
'notHelpfulMissingKeyPointsAdjusted', 'notHelpfulOutdatedAdjusted', | |
'notHelpfulHardToUnderstandAdjusted', | |
'notHelpfulArgumentativeOrBiasedAdjusted', 'notHelpfulOffTopicAdjusted', | |
'notHelpfulSpamHarassmentOrAbuseAdjusted', | |
'notHelpfulIrrelevantSourcesAdjusted', | |
'notHelpfulOpinionSpeculationAdjusted', | |
'notHelpfulNoteNotNeededAdjusted', 'notHelpfulOtherAdjustedRatio', | |
'notHelpfulIncorrectAdjustedRatio', | |
'notHelpfulSourcesMissingOrUnreliableAdjustedRatio', | |
'notHelpfulOpinionSpeculationOrBiasAdjustedRatio', | |
'notHelpfulMissingKeyPointsAdjustedRatio', | |
'notHelpfulOutdatedAdjustedRatio', | |
'notHelpfulHardToUnderstandAdjustedRatio', | |
'notHelpfulArgumentativeOrBiasedAdjustedRatio', | |
'notHelpfulOffTopicAdjustedRatio', | |
'notHelpfulSpamHarassmentOrAbuseAdjustedRatio', | |
'notHelpfulIrrelevantSourcesAdjustedRatio', | |
'notHelpfulOpinionSpeculationAdjustedRatio', | |
'notHelpfulNoteNotNeededAdjustedRatio', 'helpfulOther', | |
'helpfulInformative', 'helpfulClear', 'helpfulEmpathetic', | |
'helpfulGoodSources', 'helpfulUniqueContext', 'helpfulAddressesClaim', | |
'helpfulImportantContext', 'helpfulUnbiasedLanguage', 'notHelpfulOther', | |
'notHelpfulIncorrect', 'notHelpfulSourcesMissingOrUnreliable', |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment