Yahoo answers who is feat
This format is very simple to understand: questions and answers. It is a platform with an extremely interesting premise, taking into account that anyone could answer you and then it is up to you to believe it or not.
However, who monitors these rules? The user himself, reporting any violation of the required guidelines to the responsible team of the website. A system of points was implemented to encourage users to use the platform correctly. When you enter for the first time, you earn points.
Logging in every day guarantees one point a day. Asking a question consumes 5 points, but choosing the best answer retrieves 3. Answering questions grants 2 points. The number of questions and answers that the user can ask is also limited according to the level he occupies levels 1 to 7. Yes, there are many serious and very important questions posted on Yahoo Answers. But he became "famous" exactly for the memes.
Remember some pearls:. If you thought of a question, it has certainly already been asked on this site pic. Since , when the new purple layout was launched, Yahoo Answers has gone into decline. Little confidence, a lot of nonsense made the most "serious" part of the site collapse. At the beginning of the month, an email was sent to all Yahoo Answers users announcing its closure. Among the reasons for the feat, a drop in popularity was justified. The message goes on to say that attention will be focused on products and areas that attract more users and on premium content.
Despite the justification being the lack of popularity, there are also questions about whether this closure is not related to the spreading of fake news on the platform, after all it is a subject that is on the agenda today. Where, then, can I solve my existential doubts on the internet besides Google, of course?
Here are some examples of alternatives similar to Yahoo Answers:. O Red Ask is possibly the new favorite of the crowd. Many Yahoo Answers users have already migrated there and the number of registered users continues to increase.
The site works more or less like YR, but has a smaller number of categories to fit your questions: curiosities, politics, news and current affairs, advice, music and entertainment, sexuality, sports, debates and opinions, science, astronomy and space and questions about the site.
In Red Ask there is also the possibility to identify a question by tags, making it easier to identify the subject you want. In addition, another feature is the possibility of polling to be voted on by other users. Red Ask is a very simple site, and there you can find several names that inhabited Yahoo Answers.
For you who still have many questions to answer, it may be a good option. To create an account is very simple: just enter your name, password and email, and wait until your account is approved to use the site.
To get started, you need to create an account to access the content on the site. But creating the account is very simple, you can access the site using your Google or Facebook account, or enter a username, password and email for a separate account.
After creating the account, you will be able to choose subjects of your interest from a very wide list of topics offered. As you choose, more options are available based on what has already been chosen. Thus, the site assembles a personalized interface with the chosen interests.
It is also possible to receive private messages from other users you can control who can send you messages in the settings and receive notifications which can also be configured. Answers the items e. Hence, user-user methods are more relevant to our task. Answers items to recommend are new to the system and have experienced very few if at all user actions. We must consider such scenarios With more than a billion posted answers2 , Yahoo!
Answers is as well, since the most interesting questions are often those that currently the largest existing CQA site. The system is question- did not attract mass user interactions and have not yet been an- centric, rather than user-centric as in more recent systems such as swered in a satisfactory manner. Hence, we also resort to the other Quora. The resulting profiles allow programs to associate users questions to receive activity updates, and follow the activity of with matching products.
Among the more less if the asker chose a best answer within this period. Our approach is different. We chose not to winner arises. First, learning One of the main problems in Yahoo! Answers is a high variance such a model is non-trivial when dealing with volatile questions and in perceived question and answer quality. This problem was the relatively new users. Second, as justified later, we would like to use object of numerous research studies in recent years.
Some studies a multi-channel approach. The proposed system is centered around external sources of social relations be- tween users, and on assignment of questions and users into pre- defined topics. Our setup requires a different treatment: external social relations are not available, but we need to differentiate be- tween a rich variety of user-question interaction signals accessible within Yahoo!
Our approach is very different from ex- pert search tasks such as [18, 4] that try to identify an authoritative answer that would satisfy most; in our context the key objective is to satisfy the sole asker, in a variety of questions, some factoid but many being subjective where the notion of expertise is irrelevant.
Finally, an interesting related work deals with Google news rec- ommendation [6]. The problem there is similar to ours in terms of scalability and volatility of items with a high churn rate. The au- thors use a blend of three separate CF methods in a highly scalable fashion. We follow a different direction, in combining CA with CF, Figure 2: Three families of question attributes and in addressing separate types of user feedback.
This decision also greatly benefits the com- Recommender systems face multiple recognized challenges, such putational complexity of our model. With millions of questions as non-uniform data patterns each user interacting with a different in our inventory, and several new questions arriving each second, set of items , data sparseness, noisy user input, and scalability.
Be- both time and space complexities become easier to manage when yond these, the task of recommending questions brings additional, not modeling each question in isolation. We believe that such challenges are and will tween social and content-based signals as the the textual content continue being typical in other recommendation scenarios, espe- annotating a question is the result of a social interaction.
Con- cially those involving user generated content. Hence, a cornerstone of our modeling choice is to smoothly scriptors need to be exploited. A question may be described by merge social-based collaborative filtering with attribute-based con- textual attributes, which come from the question itself and from tent analysis. Another question descriptor may be its category, as selected by the asker. Finally, following a common approach in 4. A major challenge arising tem model MCR for assessing the match between a user and a here is finding a smooth and natural way of integrating these differ- question.
We first explain how questions and users are mapped into ent attribute types, while expressing the different significance level their attribute representation, then how multiple features are de- of each. This becomes even more challenging when considering the rived from the multi-channel attributes of the users and questions multiple channels through which users can interact with a question, and finally how user and question-specific features are incorporated as detailed below.
A second factor comes from the need to account for the multi- ple kinds of interactions of different intensities between users and 4. When data per user and item is scarce, exploiting these diverse types of user- 4. Yet, care should be taken when integrating The first family of attributes encodes textual information and different signals together, since some types of interactions should takes text tokens as values.
These tokens are extracted from the be more indicative than others. For example, we would expect that various textual fields associated with the question.
For each text answering be a more meaningful interaction than mailing. Our ap- block, our tokenizer annotates each word with its part-of-speech proach induces hundreds of interactions, so we must rely on an POS tag and lemma. The output of this process consists of a list automated procedure for combining them together effectively.
This represents a sig- As previously mentioned, question fields are of varying impor- nificant departure from the common recommender system setup, tance.
Hence, the extracted terms are counted separately within where the item-set is close to static. Such a setup usually dedicates each field, producing four sets of term, count as values of four distinct parameters for each item, e. Yet, our approach for modeling questions finally other answers. Thus, we refrain from directly modeling out non-representative terms trivial examples being stop words.
Instead, questions are This is done in two steps. If so: why? Note that HC t is a principled way to per- This model is flexible as it is easy to add more types of inter- form a tf-idf style ranking, where categories are playing the role of action as they become available. Some signals that we have not documents.
While these signals ically contains a lot of text that is related to a specific topic. Note may be weak at the individual question level, they remain the main that we keep only the N best terms of lowest L t value. Table 1 lists user ids attributes for the introduction of new technology ruin the tension in football?
If our same example question. As per their values, we see that the so: why?. The Another family of attributes reflects the category of the ques- d1 columns of the matrix correspond to each individual textual to- tion that the user has to select, from a predefined taxonomy, when ken, category and user.
The d2 rows correspond to the attributes. We obviously select the user-selected category Qq [i][j] holds the count for term j of attribute i. In our current implemen- egories, when available, in order to inherit semantic similarities. Finally, Qq [8. We limit ourselves to direct, par- correspond to the user attribute family.
The matrix is saved in a ent and grand-parent as this is the current maximal depth in Yahoo! An instance of direct, parent and grand-parent category attributes for the previously mentioned example question, which 4. Hence, they borrow much of the basic structure of the question attributes, fol- 4.
Additionally, users The third family of attributes relates to the identity of the users may explicitly pick their preferences over attributes within each of who interacted with a question. Users can interact with a question the attribute families. We next describe these two types of attribute in various ways, each deserving a different treatment.
Therefore, sources and their formal modeling. We thus 4. Hence, we keep them separate by adding another dimension to the user repository, called channels. In Section 4. Note that these seven channels are exactly those that swer voting process were used for the social attributes annotating a question. All these are stored within an additional channel that we refer to as the explicit relations channel. Answers, the only explicit relation currently available is which users are followed by the modeled user.
So the textual and category families within the explicit relations channel remain empty. The other two dimensions correspond to attributes and values, in analogy to the question representation. For all channels except the last one, we derive attributes from corresponding ques- tion attributes. This is accomplished in the following manner. We denote by Qc u the set of questions with which user u interacted through channel c. Then, the corre- sponding part of the user attributes is defined as: here they serve a different purpose: associating a user with ques- X tions rather than annotating a question with users.
Importantly, this aggregation still preserves the multi-family 1. While we chose to aggregate by summation, an alternative representation of the aggregated question attributes. Similarly for user-ID attributes, we aggregate all other However, such an averaging loses the relative support of the users who interacted with questions for which the target user gave channel, which dictates its relative importance compared to best answer.
This gives rise to multiple modes of user-user rela- other channels. Here, for example, relations user relations reflecting the strongest user-user relations. We such as best answered, asked correspond to users who asked ques- currently do not employ this option, as our data is already tions for which the target user gave the best answer. These explic- sparse enough and do not require any further sparsification. To this ent such relations. Furthermore, our model addresses additional end, given a user and a question, their respective attributes are com- kinds of implicit social relations.
All textual information entered pared, generating interaction features. These features are used by a by a user for a question is being added to the attributes of all other classifier to evaluate the match between the user and the question. The resulting hybrid of Pairing each question attribute with each user attribute creates textual and social features is a key to the success of the model by multiple features.
Since attributes of different families share no alleviating the sparseness problem of textual attributes. Indeed, a classifier can easily han- centrate only on attribute pairs within the same family.
For each dle and weigh them together such a number of possibilities as dis- question and user attributes of the same family, we create a distinct cussed in Section 4. The benefit of using multiple types of inter- interaction feature by measuring the modified cosine similarity action lies in the flexibility to weigh each such interaction appro- between their corresponding attribute vectors, which is a common priately in a data-driven learning, without worrying about a dilution way of measuring interaction.
Note that the user attribute is a com- effect caused by adding less important interactions. This enables us posite index, consisting of a channel and a question attribute.
Then, the interaction feature resulting by matching s 4. As mentioned earlier, the usual kUu [c, s]k nor- P malizer would be unwise, as it wipes out all information on the relative strength of various channels for the specific user. In our data, the textual family consists of four attributes: title, body, best answer, and other answers. In the user model, these at- tributes are multiplied by the seven aforementioned channels. There- fore, four question-attributes interact with 28 user attributes, thus producing separate interaction features.
Similarly the three at- tributes of the category family on the question side interact with 21 user-side category attributes, yielding 63 interaction features. Fi- nally, the 7 social attributes of a question interact with the 50 social attributes of a user including the added attribute from the explicit relations channel , adding social interaction features.
In total, the number of interaction features employed is Figure 4: Frequency of answering time relative to question cre- 4. However, one can observe additional activities in the system that do relate to the matching of a particu- from us to generate negative examples that are close to the true lar user to a certain question.
For example, some users are simply classification boundary. Simi- domly sampling from the Cartesian product of users and questions, larly, some questions that already received several answers are less assuming that almost all pairs are negatives. Yet, the vast majority attractive to users who shoot for best answer votes. As an example, it would be trivial to identify that a ques- specific biases as features to each question-user pair. Hence, while a classifier trained on such data would perform to build the user models in our experiments.
For questions, we add almost perfectly in distinguishing positive from negative examples, the current number of answers, stars and votes at the engagement the learned classifier would not be helpful in the more interest- time of the matched user with the question.
Thus, aiming at a better negative example set, we randomly sampled negative examples as user-question pairs 4. The their respective features.
In order to train the of granularity, within a category. More details on the construction of tial effect on user-question matching, since at the information avail- the training set and a corresponding test-set are provided in Sec- able for the tested question, and thus the feature-set presented to the tion 5.
In this, work we considered typical user these datasets is detailed in Section 5. For positive examples, engagement time was taken as the time the user actually answered the question. For each negative example, engagement time was sampled according 5.
Answers Figure 4. Notice that almost all answers are given within 60 minutes of the question 5. Our final dataset consisted of 1, , examples In our approach, user profiles or rather their attributes are rather with an equal number of positive and negative examples.
The set static, while question profiles are highly dynamic. Hence, in the had , unique users, and each user was associated with an following experiments, we built user profiles based on past user identical number of positive and negative examples.
User profiles were constructed from four consecutive months respectively, so that the sets are user disjoint. Answers activity logs. New questions were then taken from the following fifth month. Since examples of question-user pairs. Here, positive examples are pairs the feature space is not very large features , we could afford for which it is known that the user can answer well the question. The GBDT classifier for which they provided best answers.
In our setup the parameter settings are: Weighted Baseline 0. As MCR 0. In this representation, each textual term, category id and admin asker fan The weight of each best-answerer title title Families No. Note that, while in- all 0. Accuracy is a standard metric for classifiers per- weighted baseline assumes some interactions of the user with the formance, measuring the fraction of correctly classified examples.
This prevents us from train- Table 3 details the results for the two models. Although consider- ing a complex classifier such as GBDT. Yet, it has been estab- able work was invested in improving the baseline, the MCR model lished that linear classifiers are among the best performers when performance turned out to be significantly better with no tuning dealing with high-dimensional classification [25].
Accordingly, whatsoever. This result shows the advantage of the MCR model, in- the baseline models were trained using Stochastic Gradient Decent tegrating various recommendation signals via different interaction with a logistic loss, efficiently implemented by Vowpal Wabbit 3. Table 4 details had no significant effect on results.
This is due to the fact that different features have Each feature in the table is composed of the channel, question different scales and that the norms of feature vectors for different and user attributes involved in generating it. From the table we users differ by several orders of magnitude. In addition, while the dominating channels are feature family category, text, user-ids are scaled to have a squared mostly related to answers, one of the highest ranked features come norm 1. Note that this normalization is similar to the one participate in best-answer voting and those who could actually an- applied to the features in Section 4.
This also shows the importance of splitting the attribute space into multiple channels, as otherwise this signal would have been lost. Table 2: Baseline accuracy using various weight preprocessing original standardized log normalized To further analyze the MCR model, we conducted ablation tests Simple Baseline 0. Table 5 describes the results of testing the classifier with the possible feature-subsets, as well as the number of features in each subset. For example, a user usually answering in domains such as movies and baby-names decided to answer a question in biology.
Analyzing the false positives showed that about half of the them are actually not erroneous matches as far as we can tell. We hope that an additional important benefit of our system will be the improvement in the quality or perception of quality of Yahoo! Answers content in the long run, with potential benefits to other question-answering systems in general.
Exploiting all available signals has been shown to be central to Figure 6: Frequencies of positive black and negative gray improving recommenders quality. This was a key driver of our ap- test examples as a function of their MCR scores proach, which uses a multi-channel model to easily support the ex- pression of any available signal.
We considered multiple types of relations between a user and a question, between a question and a covery of promising user-question pairs. This is an opposite result token, and between users and categories.
0コメント